HOUSE PRICE PREDICTION

Structure:

  1. Introduction
  2. Data Loading
  3. EDA - Univariate
  4. EDA - Bivariate
  5. Data Preprocessing
  6. Model Building with Dataset-1
  7. Hypertuning Dataset-1
  8. Summary - Dataset-1
  9. Model Building with Dataset-2
  10. Hypertuning Dataset-2
  11. Summary - Dataset -2
  12. Conclusion
  13. Pickle file creation

Note:
Dataset - 1 = 22 features
['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality', 'ceil_measure', 'basement', 'yr_built', 'living_measure15', 'lot_measure15', 'furnished', 'total_area', 'month_year', 'City', 'has_basement', 'HouseLandRatio', 'has_renovated']


Dataset - 2 = 31 features (important features after imputing dummy and analyzing different models)
['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure', 'ceil', 'sight', 'condition', 'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long', 'living_measure15', 'lot_measure15', 'total_area', 'coast_1', 'quality_3', 'quality_4', 'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9', 'quality_10', 'quality_11', 'quality_12', 'quality_13', 'furnished_1'

Prerequisites for the running the file:

Below are 2 files needed to be added to you current working directory.

 1. Need to add file USA ZipCodes_1.xlsx to current working directory to access this data 
 2. Add the folder WA to your current working directory
 3. Install below 2 libraries
    conda install -c conda-forge/label/cf201901 geopandas 
    conda install -c conda-forge/label/cf201901 shapely 

</b>

This Jupyter Notebook is done as part of PGPML Great Learning Programme for Capstone Project. Let's first, define the problem, objective of this excercise.

We have the problem statment well defined in the given document which is as follows

INTRODUCTION

Problem Statement

As a house value is simply more than location and square footage. Like the features that make up a person, an educated party would want to know all aspects that give a house its value. For example, if we want to sell a house and we don't know the price which we can take, as it can't be too low or too high. To find house price we usually try to find similar properties in our neighbourhood and based on collected data we trying to assess our house price.

Problem Definition

When any person/business wants to sell or buy a house, they always face this kind of issue as they don't know the price which they should offer. Due to this they might be offering too low or high for the property. Therefore, we can analyze the available data of the properties in the area and can predict the price. We need to find how these attributes influence the house prices Right pricing is very imporatnt aspect to sell house. It is very important to understand what are the factors and how they influence the house price. Objective is to predict the right price of the house based on the attributes

Objective

Build model which will predict the house price when required features passed to the model. So we will

  • Find out the significant features from the given features dataset which affects the house price the most.
  • Build best feasible model to predict the house price with 95% confidence level
  • Business Reason

    As people don't know the features/aspects which commulate property price, we can provide them HouseBuyingSelling guiding services in the area so they can buy or sell their property with most suitable price tag and they didn't lose their hard earned money by offering low price or keep waiting for buyers by putting high prices.

    DATA LOADING

    First, we will load the data from the given csv(comma seperated values) file provided as part of the Capstone Project.

    In [2]:
    # loading the library required for data loading and processing
    import pandas as pd   
    import numpy as np 
    
    #Supress warnings
    import warnings
    warnings.filterwarnings('ignore')
    
    # read the data using pandas function from 'innercity.csv' file
    house_df = pd.read_csv('innercity.csv')
    
    In [3]:
    # let's check whether data loaded successfully or not, by checking first few records
    house_df.head()
    
    Out[3]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... basement yr_built yr_renovated zipcode lat long living_measure15 lot_measure15 furnished total_area
    0 3034200666 20141107T000000 808100 4 3.25 3020 13457 1.0 0 0 ... 0 1956 0 98133 47.7174 -122.336 2120 7553 1 16477
    1 8731981640 20141204T000000 277500 4 2.50 2550 7500 1.0 0 0 ... 800 1976 0 98023 47.3165 -122.386 2260 8800 0 10050
    2 5104530220 20150420T000000 404000 3 2.50 2370 4324 2.0 0 0 ... 0 2006 0 98038 47.3515 -121.999 2370 4348 0 6694
    3 6145600285 20140529T000000 300000 2 1.00 820 3844 1.0 0 0 ... 0 1916 0 98133 47.7049 -122.349 1520 3844 0 4664
    4 8924100111 20150424T000000 699000 2 1.50 1400 4050 1.0 0 0 ... 0 1954 0 98115 47.6768 -122.269 1900 5940 0 5450

    5 rows × 23 columns

    Data is loaded successfully as we can see first 5 records from the dataset.

    Data Understanding

    After loading data into our pandas library dataframe, we can now try to understand the kind of data we have with us.

    In [4]:
    # print the number of records and features/aspects we have in the provided file
    house_df.shape
    
    Out[4]:
    (21613, 23)

    We have more than 21k records having 23 features

    In [5]:
    # let's check out the columns/features we have in the dataset
    
    house_df.columns
    
    Out[5]:
    Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',
           'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality',
           'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode',
           'lat', 'long', 'living_measure15', 'lot_measure15', 'furnished',
           'total_area'],
          dtype='object')

    From the above we can see the different columns we have in dataset.

    These columns provide below information

    1. cid: Notation for a house. Will not of our use. So we will drop this column
    2. dayhours: Represents Date, when house was sold.
    3. price: It's our TARGET feature, that we have to predict based on other featues
    4. room_bed: Represents number of bedrooms in a house
    5. room_bath: Represents number of bathrooms
    6. living_measure: Represents square footage of house
    7. lot_measure: Represents square footage of lot
    8. ceil: Represents number of floors in house
    9. coast: Represents whether house has waterfront view. It seems to be a categorical variable. We will see in our further data analysis
    10. sight: Represents how many times sight has been viewed.
    11. condition: Represents the overall condition of the house. It's kind of rating given to the house.
    12. quality: Represents grade given to the house based on grading system
    13. ceil_measure: Represents square footage of house apart from basement
    14. basement: Represents square footage of basement
    15. yr_built: Represents the year when house was built
    16. yr_renovated: Represents the year when house was last renovated
    17. zipcode: Represents zipcode as name implies
    18. lat: Represents Lattitude co-ordniates
    19. long: Represents Longitude co-ordinates
    20. living_measure15: Represents square footage of house, when measured in 2015 year as house area may or may not changed after renovation if any happened
    21. lot_measure15: Represents square footage of lot, when measured in 2015 year as lot area may or may not change after renovation if any done
    22. furnished: Tells whether house is furnished or not. It seems to be categorical variable as description implies
    23. total_area: Represents total area i.e. area of both living and lot
    In [6]:
    # let's see the data types of the features
    house_df.info()
    
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 21613 entries, 0 to 21612
    Data columns (total 23 columns):
    cid                 21613 non-null int64
    dayhours            21613 non-null object
    price               21613 non-null int64
    room_bed            21613 non-null int64
    room_bath           21613 non-null float64
    living_measure      21613 non-null int64
    lot_measure         21613 non-null int64
    ceil                21613 non-null float64
    coast               21613 non-null int64
    sight               21613 non-null int64
    condition           21613 non-null int64
    quality             21613 non-null int64
    ceil_measure        21613 non-null int64
    basement            21613 non-null int64
    yr_built            21613 non-null int64
    yr_renovated        21613 non-null int64
    zipcode             21613 non-null int64
    lat                 21613 non-null float64
    long                21613 non-null float64
    living_measure15    21613 non-null int64
    lot_measure15       21613 non-null int64
    furnished           21613 non-null int64
    total_area          21613 non-null int64
    dtypes: float64(4), int64(18), object(1)
    memory usage: 3.8+ MB
    

    In the dataset, we have more than 21k records and 23 columns, out of which

  • 4 features are of float type
  • 18 features are of integer type
  • 1 feature is of object type (we may need to convert this object type to specific datatype)
  • In [7]:
    # let's check whether our dataset have any null/missing values
    house_df.isnull().sum()
    
    Out[7]:
    cid                 0
    dayhours            0
    price               0
    room_bed            0
    room_bath           0
    living_measure      0
    lot_measure         0
    ceil                0
    coast               0
    sight               0
    condition           0
    quality             0
    ceil_measure        0
    basement            0
    yr_built            0
    yr_renovated        0
    zipcode             0
    lat                 0
    long                0
    living_measure15    0
    lot_measure15       0
    furnished           0
    total_area          0
    dtype: int64

    We don't have any null or missing values for any of the columns

    In [8]:
    # let's check whether there's any duplicate record in our dataset or not. If present, we have to remove them
    house_df.duplicated().sum()
    
    Out[8]:
    0

    We don't have any duplicate record in out dataset. So we can say we have more than 21k Unique records

    In [9]:
    # let's do the 5 - factor analysis of the features
    
    house_df.describe().transpose()
    
    Out[9]:
    count mean std min 25% 50% 75% max
    cid 21613.0 4.580302e+09 2.876566e+09 1.000102e+06 2.123049e+09 3.904930e+09 7.308900e+09 9.900000e+09
    price 21613.0 5.401822e+05 3.673622e+05 7.500000e+04 3.219500e+05 4.500000e+05 6.450000e+05 7.700000e+06
    room_bed 21613.0 3.370842e+00 9.300618e-01 0.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 3.300000e+01
    room_bath 21613.0 2.114757e+00 7.701632e-01 0.000000e+00 1.750000e+00 2.250000e+00 2.500000e+00 8.000000e+00
    living_measure 21613.0 2.079900e+03 9.184409e+02 2.900000e+02 1.427000e+03 1.910000e+03 2.550000e+03 1.354000e+04
    lot_measure 21613.0 1.510697e+04 4.142051e+04 5.200000e+02 5.040000e+03 7.618000e+03 1.068800e+04 1.651359e+06
    ceil 21613.0 1.494309e+00 5.399889e-01 1.000000e+00 1.000000e+00 1.500000e+00 2.000000e+00 3.500000e+00
    coast 21613.0 7.541757e-03 8.651720e-02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
    sight 21613.0 2.343034e-01 7.663176e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 4.000000e+00
    condition 21613.0 3.409430e+00 6.507430e-01 1.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 5.000000e+00
    quality 21613.0 7.656873e+00 1.175459e+00 1.000000e+00 7.000000e+00 7.000000e+00 8.000000e+00 1.300000e+01
    ceil_measure 21613.0 1.788391e+03 8.280910e+02 2.900000e+02 1.190000e+03 1.560000e+03 2.210000e+03 9.410000e+03
    basement 21613.0 2.915090e+02 4.425750e+02 0.000000e+00 0.000000e+00 0.000000e+00 5.600000e+02 4.820000e+03
    yr_built 21613.0 1.971005e+03 2.937341e+01 1.900000e+03 1.951000e+03 1.975000e+03 1.997000e+03 2.015000e+03
    yr_renovated 21613.0 8.440226e+01 4.016792e+02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.015000e+03
    zipcode 21613.0 9.807794e+04 5.350503e+01 9.800100e+04 9.803300e+04 9.806500e+04 9.811800e+04 9.819900e+04
    lat 21613.0 4.756005e+01 1.385637e-01 4.715590e+01 4.747100e+01 4.757180e+01 4.767800e+01 4.777760e+01
    long 21613.0 -1.222139e+02 1.408283e-01 -1.225190e+02 -1.223280e+02 -1.222300e+02 -1.221250e+02 -1.213150e+02
    living_measure15 21613.0 1.986552e+03 6.853913e+02 3.990000e+02 1.490000e+03 1.840000e+03 2.360000e+03 6.210000e+03
    lot_measure15 21613.0 1.276846e+04 2.730418e+04 6.510000e+02 5.100000e+03 7.620000e+03 1.008300e+04 8.712000e+05
    furnished 21613.0 1.966872e-01 3.975030e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
    total_area 21613.0 1.718687e+04 4.158908e+04 1.423000e+03 7.035000e+03 9.575000e+03 1.300000e+04 1.652659e+06
    1. CID: House ID/Property ID.Not used for analysis
    2. Dayhours: 5 factor analysis is reflecting for this column
    3. price: Our taget column value is in 75k - 7700k range. As Mean > Median, it's Right-Skewed.
    4. room_bed: Number of bedrooms range from 0 - 33. As Mean slightly > Median, it's slightly Right-Skewed.
    5. room_bath: Number of bathrooms range from 0 - 8. As Mean slightly < Median, it's slightly Left-Skewed.
    6. living_measure: Square footage of house range from 290 - 13,540. As Mean > Median, it's Right-Skewed.
    7. lot_measure: Square footage of lot range from 520 - 16,51,359. As Mean almost double of Median, it's Hightly Right-Skewed.
    8. ceil: Number of floors range from 1 - 3.5 As Mean ~ Median, it's almost Normal Distributed.
    9. coast: As this value represent whether house has waterfront view or not. It's categorical column. From above analysis we got know, very few houses has waterfront view.
    10. sight: Value ranges from 0 - 4. As Mean > Median, it's Right-Skewed
    11. condition: Represents rating of house which ranges from 1 - 5. As Mean > Median, it's Right-Skewed
    12. quality: Representign grade given to house which range from 1 - 13. As Mean > Median, it's Right-Skewed.
    13. ceil_measure: Square footage of house apart from basement ranges in 290 - 9,410. As Mean > Median, it's Right-Skewed.
    14. basement: Square footage house basement ranges in 0 - 4,820. As Mean highlty > Median, it's Highly Right-Skewed.
    15. yr_built: House built year ranges from 1900 - 2015. As Mean < Median, it's Left-Skewed.
    16. yr_renovated: House renovation year only 2015. So this column can be used as Categorical Variable for knowing whether house is renovated or not.
    17. zipcode: House ZipCode ranges from 98001 - 98199. As Mean > Median, it's Right-Skewed.
    18. lat: Lattitude ranges from 47.1559 - 47.7776 As Mean < Median, it's Left-Skewed.
    19. long: Longittude ranges from -122.5190 to -121.315 As Mean > Median, it's Right-Skewed.
    20. living_measure15: Value ragnes from 399 to 6,210. As Mean > Median, it's Right-Skewed.
    21. lot_measure15: Value ragnes from 651 to 8,71,200. As Mean highly > Median, it's Highly Right-Skewed.
    22. furnished: Representing whether house is furnished or not. It's a Categorical Variable
    23. total_area Total area of house ranges from 1,423 to 16,52,659. As Mean is almost double of Median, it's Highly Right-Skewed

    From above analysis we got to know,

    Most columns distribution is Right-Skewed and only few features are Left-Skewed (like room_bath, yr_built, lat).

    We have columns which are Categorical in nature are -> coast, yr_renovated, furnished

    Exploratory Data Analysis

    Let's do some visual data analysis of the features

    Univariate Analysis - By BoxPlot

    In [10]:
    #let's first import the required libraries for the plots
    import matplotlib.pyplot as plt
    import seaborn as sns
    %matplotlib inline
    
    # size of plots to make it uniform throughout our analysis in the notebook
    plotSizeX = 12
    plotSizeY = 6
    # let's boxplot all the numerical columns and see if there any outliers
    for i in house_df.iloc[:, 2:].columns:
        house_df.iloc[:, 1:].boxplot(column=i)
        plt.show()
    

    We can see, there are lot of features which have outliers. So we might need to treat those before building model

    Analyzing Feature: cid

    In [11]:
    #cid - CID is appearing muliple times, it seems data contains house which is sold multiple times
    cid_count=house_df.cid.value_counts()
    cid_count[cid_count>1].shape
    
    Out[11]:
    (176,)

    We have 176 properties that were sold more than once in the given data

    Analyzing Feature: dayhours

    In [12]:
    #we will create new data frame that can be used for modeling
    #We will convert the dayhours to 'month_year' as sale month-year is relevant for analysis
    
    house_dfr=house_df.copy()
    house_df.dayhours=house_df.dayhours.str.replace('T000000', "")
    house_df.dayhours=pd.to_datetime(house_df.dayhours,format='%Y%m%d')
    house_df['month_year']=house_df['dayhours'].apply(lambda x: x.strftime('%B-%Y'))
    house_df['month_year'].head()
    
    Out[12]:
    0    November-2014
    1    December-2014
    2       April-2015
    3         May-2014
    4       April-2015
    Name: month_year, dtype: object

    We successfully converted dayhours feature to month_year for better analysis.

    In [13]:
    house_df['month_year'].value_counts()
    
    Out[13]:
    April-2015        2231
    July-2014         2211
    June-2014         2180
    August-2014       1940
    October-2014      1878
    March-2015        1875
    September-2014    1774
    May-2014          1768
    December-2014     1471
    November-2014     1411
    February-2015     1250
    January-2015       978
    May-2015           646
    Name: month_year, dtype: int64

    We can see, most houses sold in April, July month

    In [14]:
    house_df.groupby(['month_year'])['price'].agg('mean')
    
    Out[14]:
    month_year
    April-2015        561933.463021
    August-2014       536527.039691
    December-2014     524602.893270
    February-2015     507919.603200
    January-2015      525963.251534
    July-2014         544892.161013
    June-2014         558123.736239
    March-2015        544057.683200
    May-2014          548166.600113
    May-2015          558193.095975
    November-2014     522058.861800
    October-2014      539127.477636
    September-2014    529315.868095
    Name: price, dtype: float64

    So the time line of the sale data of the properties is from May-2014 to May-2015 and April month have the highest mean price.

    Analyzing Feature: Price (our Target)

    In [15]:
    house_df.price.describe()
    
    Out[15]:
    count    2.161300e+04
    mean     5.401822e+05
    std      3.673622e+05
    min      7.500000e+04
    25%      3.219500e+05
    50%      4.500000e+05
    75%      6.450000e+05
    max      7.700000e+06
    Name: price, dtype: float64
    In [16]:
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.distplot(house_df['price'])
    
    Out[16]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225ef84d550>

    The Price is ranging from 75,000 to 77,00,000 and distribution is right-skewed.

    Analyzing Feature: room_bed
    In [17]:
    house_df['room_bed'].value_counts()
    
    Out[17]:
    3     9824
    4     6882
    2     2760
    5     1601
    6      272
    1      199
    7       38
    8       13
    0       13
    9        6
    10       3
    11       1
    33       1
    Name: room_bed, dtype: int64

    The value of 33 seems to be outlier we need to check the data point before imputing the same

    In [18]:
    house_df[house_df['room_bed']==33]
    
    Out[18]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat long living_measure15 lot_measure15 furnished total_area month_year
    750 2402100895 2014-06-25 640000 33 1.75 1620 6000 1.0 0 0 ... 1947 0 98103 47.6878 -122.331 1330 4700 0 7620 June-2014

    1 rows × 24 columns

    Will delete this data point after bivariate analysis as it looks to be an outlier as it has low price for 33 bed room property

    In [19]:
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.countplot(house_df.room_bed,color='green')
    
    Out[19]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225ef14f780>

    Most of the houses/properties have 3 or 4 bedrooms

    Analyzing Feature: room_bath

    In [20]:
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.countplot(house_df.room_bath,color='green')
    house_df['room_bath'].value_counts().sort_index()
    
    Out[20]:
    0.00      10
    0.50       4
    0.75      72
    1.00    3852
    1.25       9
    1.50    1446
    1.75    3048
    2.00    1930
    2.25    2047
    2.50    5380
    2.75    1185
    3.00     753
    3.25     589
    3.50     731
    3.75     155
    4.00     136
    4.25      79
    4.50     100
    4.75      23
    5.00      21
    5.25      13
    5.50      10
    5.75       4
    6.00       6
    6.25       2
    6.50       2
    6.75       2
    7.50       1
    7.75       1
    8.00       2
    Name: room_bath, dtype: int64

    Majority of the properties have bathroom in the range of 1.0 to 2.5

    In [21]:
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print("Skewness is :",house_df.room_bath.skew())
    sns.distplot(house_df.room_bath)
    
    Skewness is : 0.511107573347417
    
    Out[21]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225ef14f748>

    Analyzing Feature: Living measure

    In [22]:
    #Data is skewed as visible from plot, as its distribution is normal
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print("Skewness is :",house_df.living_measure.skew())
    sns.distplot(house_df.living_measure)
    house_df.living_measure.describe()
    
    Skewness is : 1.471555426802092
    
    Out[22]:
    count    21613.000000
    mean      2079.899736
    std        918.440897
    min        290.000000
    25%       1427.000000
    50%       1910.000000
    75%       2550.000000
    max      13540.000000
    Name: living_measure, dtype: float64

    Data distribution tells us, living_measure is right-skewed.

    In [23]:
    #Let's plot the boxplot for living_measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.boxplot(house_df.living_measure)
    
    Out[23]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225ef0f5b70>

    There are many outliers in living measure. Need to review further to treat the same.

    In [24]:
    #checking the no. of data points with Living measure greater than 8000
    house_df[house_df['living_measure']>8000]
    
    Out[24]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat long living_measure15 lot_measure15 furnished total_area month_year
    264 9208900037 2014-09-19 6890000 6 7.75 9890 31374 2.0 0 4 ... 2001 0 98039 47.6305 -122.240 4540 42730 1 41264 September-2014
    668 1924059029 2014-06-17 4670000 5 6.75 9640 13068 1.0 1 4 ... 1983 2009 98040 47.5570 -122.210 3270 10454 1 22708 June-2014
    1123 2303900035 2014-06-11 2890000 5 6.25 8670 64033 2.0 0 4 ... 1965 2003 98177 47.7295 -122.372 4140 81021 1 72703 June-2014
    4789 1247600105 2014-10-20 5110000 5 5.25 8010 45517 2.0 1 4 ... 1999 0 98033 47.6767 -122.211 3430 26788 1 53527 October-2014
    16785 6762700020 2014-10-13 7700000 6 8.00 12050 27600 2.5 0 3 ... 1910 1987 98102 47.6298 -122.323 3940 8800 1 39650 October-2014
    18393 6072800246 2014-07-02 3300000 5 6.25 8020 21738 2.0 0 0 ... 2001 0 98006 47.5675 -122.189 4160 18969 1 29758 July-2014
    19888 9808700762 2014-06-11 7060000 5 4.50 10040 37325 2.0 1 2 ... 1940 2001 98004 47.6500 -122.214 3930 25449 1 47365 June-2014
    20740 1225069038 2014-05-05 2280000 7 8.00 13540 307752 3.0 0 4 ... 1999 0 98053 47.6675 -121.986 4850 217800 1 321292 May-2014
    20917 2470100110 2014-08-04 5570000 5 5.75 9200 35069 2.0 0 0 ... 2001 0 98039 47.6289 -122.233 3560 24345 1 44269 August-2014

    9 rows × 24 columns

    We have only 9 properties/house which have more than 8k living_measure. So will treat these outliers.

    Analyzing Feature: lot_measure

    In [25]:
    #Data is skewed as visible from plot
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print("Skewness is :",house_df.lot_measure.skew())
    sns.boxplot(house_df.lot_measure)
    house_df.lot_measure.describe()
    
    Skewness is : 13.06001895903175
    
    Out[25]:
    count    2.161300e+04
    mean     1.510697e+04
    std      4.142051e+04
    min      5.200000e+02
    25%      5.040000e+03
    50%      7.618000e+03
    75%      1.068800e+04
    max      1.651359e+06
    Name: lot_measure, dtype: float64
    In [26]:
    #checking the no. of data points with Lot measure greater than 1250000
    house_df[house_df['lot_measure']>1250000]
    
    Out[26]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat long living_measure15 lot_measure15 furnished total_area month_year
    1113 1020069017 2015-03-27 700000 4 1.0 1300 1651359 1.0 0 3 ... 1920 0 98022 47.2313 -122.023 2560 425581 0 1652659 March-2015

    1 rows × 24 columns

    We have only 1 property with more than 12,50,000 lot_measure. So we need to treat this.

    Analyzing Feature: ceil

    In [27]:
    #let's see the ceil count for all the records
    house_df.ceil.value_counts()
    
    Out[27]:
    1.0    10680
    2.0     8241
    1.5     1910
    3.0      613
    2.5      161
    3.5        8
    Name: ceil, dtype: int64

    We can see, most houses have 1 floor

    In [28]:
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.countplot('ceil',data=house_df)
    
    Out[28]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225ef19ff60>

    Above grapth confirming the same, that most properties have 1 and 2 floors

    Analyzing Feature: coast

    In [29]:
    #coast - most houses donot have waterfront view, very few are waterfront
    house_df.coast.value_counts()
    
    Out[29]:
    0    21450
    1      163
    Name: coast, dtype: int64

    Analyzing Feature: sight

    In [30]:
    #sight - most sights have not been viewed
    house_df.sight.value_counts()
    
    Out[30]:
    0    19489
    2      963
    3      510
    1      332
    4      319
    Name: sight, dtype: int64

    Analyzing Feature: condition

    In [31]:
    #condition - Overall most houses are rated as 3 and above for its condition overall
    house_df.condition.value_counts()
    
    Out[31]:
    3    14031
    4     5679
    5     1701
    2      172
    1       30
    Name: condition, dtype: int64

    Analyzing Feature: quality

    In [32]:
    #Quality - most properties have quality rating between 6 to 10
    house_df.quality.value_counts()
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.countplot('quality',data=house_df)
    
    Out[32]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225eedbd358>
    In [33]:
    #checking the no. of data points with quality rating as 13
    house_df[house_df['quality']==13]
    
    Out[33]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat long living_measure15 lot_measure15 furnished total_area month_year
    264 9208900037 2014-09-19 6890000 6 7.75 9890 31374 2.0 0 4 ... 2001 0 98039 47.6305 -122.240 4540 42730 1 41264 September-2014
    1123 2303900035 2014-06-11 2890000 5 6.25 8670 64033 2.0 0 4 ... 1965 2003 98177 47.7295 -122.372 4140 81021 1 72703 June-2014
    1583 2426039123 2015-01-30 2420000 5 4.75 7880 24250 2.0 0 2 ... 1996 0 98177 47.7334 -122.362 2740 10761 1 32130 January-2015
    7095 2303900100 2014-09-11 3800000 3 4.25 5510 35000 2.0 0 4 ... 1997 0 98177 47.7296 -122.370 3430 45302 1 40510 September-2014
    8509 4139900180 2015-04-20 2340000 4 2.50 4500 35200 1.0 0 0 ... 1988 0 98006 47.5477 -122.126 4760 35200 1 39700 April-2015
    9446 1068000375 2014-09-23 3200000 6 5.00 7100 18200 2.5 0 0 ... 1933 2002 98199 47.6427 -122.408 3130 6477 1 25300 September-2014
    10387 7237501190 2014-10-10 1780000 4 3.25 4890 13402 2.0 0 0 ... 2004 0 98059 47.5303 -122.131 5790 13539 1 18292 October-2014
    12320 1725059316 2014-11-20 2390000 4 4.00 6330 13296 2.0 0 2 ... 2000 0 98033 47.6488 -122.201 2200 9196 1 19626 November-2014
    12686 853200010 2014-07-01 3800000 5 5.50 7050 42840 1.0 0 2 ... 1978 0 98004 47.6229 -122.220 5070 20570 1 49890 July-2014
    16785 6762700020 2014-10-13 7700000 6 8.00 12050 27600 2.5 0 3 ... 1910 1987 98102 47.6298 -122.323 3940 8800 1 39650 October-2014
    17322 9831200500 2015-03-04 2480000 5 3.75 6810 7500 2.5 0 0 ... 1922 0 98102 47.6285 -122.322 2660 7500 1 14310 March-2015
    20892 3303850390 2014-12-12 2980000 5 5.50 7400 18898 2.0 0 3 ... 2001 0 98006 47.5431 -122.112 6110 26442 1 26298 December-2014
    20917 2470100110 2014-08-04 5570000 5 5.75 9200 35069 2.0 0 0 ... 2001 0 98039 47.6289 -122.233 3560 24345 1 44269 August-2014

    13 rows × 24 columns

    There are only 13 propeties which have the highest quality rating

    Analyzing Feature: ceil_measure

    In [34]:
    #ceil_measure - its highly skewed
    print("Skewness is :", house_df.ceil_measure.skew())
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.distplot(house_df.ceil_measure)
    house_df.ceil_measure.describe()
    
    Skewness is : 1.4466644733818372
    
    Out[34]:
    count    21613.000000
    mean      1788.390691
    std        828.090978
    min        290.000000
    25%       1190.000000
    50%       1560.000000
    75%       2210.000000
    max       9410.000000
    Name: ceil_measure, dtype: float64
    In [35]:
    sns.factorplot(x='ceil',y='ceil_measure',data=house_df, size = 4, aspect = 2)
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    Out[35]:
    <seaborn.axisgrid.FacetGrid at 0x225ef353f28>

    There is no pattern in Ceil Vs Ceil_measure

    The vertival lines at each point represent the inter quartile range of values at that point

    Analyzing Feature: basement

    In [36]:
    #basement_measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.distplot(house_df.basement)
    
    Out[36]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225f1238080>

    We can see 2 gaussians, which tells us there are propeties which don't have basements and some have the basements

    In [37]:
    house_df[house_df.basement==0].shape
    
    Out[37]:
    (13126, 24)

    We have almost 60% of the properties without basement

    In [38]:
    #houses have zero measure of basement i.e. they donot have basements
    #let's plot boxplot for properties which have basements only
    house_df_base=house_df[house_df['basement']>0]
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.boxplot(house_df_base['basement'])
    
    Out[38]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225f0f92a20>

    We can clearly see, there are outliers. We need to treat this before our model.

    In [39]:
    #checking the no. of data points with 'basement' greater than 4000
    house_df[house_df['basement']>4000]
    
    Out[39]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat long living_measure15 lot_measure15 furnished total_area month_year
    668 1924059029 2014-06-17 4670000 5 6.75 9640 13068 1.0 1 4 ... 1983 2009 98040 47.5570 -122.210 3270 10454 1 22708 June-2014
    20740 1225069038 2014-05-05 2280000 7 8.00 13540 307752 3.0 0 4 ... 1999 0 98053 47.6675 -121.986 4850 217800 1 321292 May-2014

    2 rows × 24 columns

    We have only 2 properties with more than 4,000 measure basement

    In [40]:
    #Distribution of houses having basement
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.distplot(house_df_base.basement)
    
    Out[40]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225f102bd30>

    Distribution having basement is right-skewed

    Analyzing Feature: yr_built

    In [41]:
    #house range from new to very old
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.distplot(house_df.yr_built)
    
    Out[41]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225f125f5c0>

    The built year of the properties range from 1900 to 2014 and we can see upward trend with time

    Analyzing Feature: yr_renovated

    In [42]:
    house_df[house_df['yr_renovated']>0].shape
    
    Out[42]:
    (914, 24)

    Only 914 houses were renovated out of 21613 houses

    In [43]:
    #yr_renovated - plot of houses which are renovated
    house_df_reno=house_df[house_df['yr_renovated']>0]
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.distplot(house_df_reno.yr_renovated)
    
    Out[43]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225ef896208>

    Now will create age column from columns : yr_built & yr_renovated

    Analyzing Feature: Zipcode, Lat, Long

    In [46]:
    #For geographic visual
    import geopandas as gpd
    from shapely.geometry import Point, Polygon
    #For current working directory
    import os
    cwd = os.getcwd()
    
    In [47]:
    ## Need to add file USA ZipCodes_1.xlsx to current working directory to access this data
    USAZip=pd.read_excel("USA ZipCodes_1.xlsx",sheet_name="Sheet8")
    USAZip.head()
    
    Out[47]:
    zipcode City County Type
    0 98001 Auburn King Standard
    1 98002 Auburn King Standard
    2 98003 Federal Way King Standard
    3 98004 Bellevue King Standard
    4 98005 Bellevue King Standard
    In [48]:
    house_df=house_df.merge(USAZip,how='left',on='zipcode')
    #house_df.drop_duplicates()
    
    In [49]:
    #let's see the shape of our dataframe
    house_df.shape
    
    Out[49]:
    (21613, 27)

    Now we have 27 features

    In [5]:
    #Add the folder WA to your current working directory
    usa = gpd.read_file(cwd+'\\WA\\WSDOT__City_Limits.shp')
    usa.head()
    gdf = gpd.GeoDataFrame(
        house_df,geometry = [Point(xy) for xy in zip(house_df['long'], house_df['lat'])])
    #We can now plot our ``GeoDataFrame``
    ax=usa[usa.CityName.isin(house_df.City.unique())].plot(
        color='white', edgecolor='black',figsize=(20,8))
    plt.figure(figsize=(15,15))
    gdf.plot(ax=ax, color='green', marker='o',markersize=0.1)
    
    Out[5]:
    <matplotlib.axes._subplots.AxesSubplot at 0x1ccf1142588>
    <Figure size 1080x1080 with 0 Axes>
    In [51]:
    #let's see the columns of dataframe once again
    house_df.columns
    
    Out[51]:
    Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',
           'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality',
           'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode',
           'lat', 'long', 'living_measure15', 'lot_measure15', 'furnished',
           'total_area', 'month_year', 'City', 'County', 'Type'],
          dtype='object')

    So we have 'City', 'Country', 'Type' as new feature in our dataframe

    In [52]:
    house_df.Type.value_counts()
    
    Out[52]:
    Standard    21613
    Name: Type, dtype: int64

    As the type is same for all the columns, we will remove this column in further analysis

    In [53]:
    house_df.City.value_counts()
    
    Out[53]:
    Seattle          8977
    Renton           1597
    Bellevue         1407
    Kent             1203
    Redmond           979
    Kirkland          977
    Auburn            912
    Sammamish         800
    Federal Way       779
    Issaquah          733
    Maple Valley      590
    Woodinville       471
    Snoqualmie        310
    Kenmore           283
    Mercer Island     282
    Enumclaw          234
    North Bend        221
    Bothell           195
    Duvall            190
    Carnation         124
    Vashon            118
    Black Diamond     100
    Fall City          81
    Medina             50
    Name: City, dtype: int64

    So we have most properties in 'Seattle' city and least in 'Medina' city

    Analyzing Feature: furnished

    In [54]:
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.countplot('furnished',data=house_df)
    house_df.furnished.value_counts()
    
    Out[54]:
    0    17362
    1     4251
    Name: furnished, dtype: int64

    Most properties are not furnished. Furnish column need to be converted into categorical column

    BIVARIATE ANALYSIS

    PairPlot

    In [55]:
    # let's plot all the variables and confirm our above deduction with more confidence
    sns.pairplot(house_df, diag_kind = 'kde')
    
    Out[55]:
    <seaborn.axisgrid.PairGrid at 0x225f12a3a90>

    From above pair plot, we observed/deduced below

    1. price: price distribution is Right-Skewed as we deduced earlier from our 5-factor analysis
    2. room_bed: our target variable (price) and room_bed plot is not linear. It's distribution have lot of gaussians
    3. room_bath: It's plot with price has somewhat linear relationship. Distribution has number of gaussians.
    4. living_measure: Plot against price has strong linear relationship. It also have linear relationship with room_bath variable. So might remove one of these 2. Distribution is Right-Skewed.
    5. lot_measure: No clear relationship with price.
    6. ceil: No clear relationship with price. We can see, it's have 6 unique values only. Therefore, we can convert this column into categorical column for values.
    7. coast: No clear relationship with price. Clearly it's categorical variable with 2 unique values.
    8. sight: No clear relationship with price. This has 5 unique values. Can be converted to Categorical variable.
    9. condition: No clear relationship with price. This has 5 unique values. Can be converted to Categorical variable.
    10. quality: Somewhat linear relationship with price. Has discrete values from 1 - 13. Can be converted to Categorical variable.
    11. ceil_measure: Strong linear relationship with price. Also with room_bath and living_measure features. Distribution is Right-Skewed.
    12. basement: No clear relationship with price.
    13. yr_built: No clear relationship with price.
    14. yr_renovated: No clear relationship with price. Have 2 unique values. Can be converted to Categorical Variable which tells whether house is renovated or not.
    15. zipcode, lat, long: No clear relationship with price or any other feature.
    16. living_measure15: Somewhat linear relationship with target feature. It's same as living_measure. Therefore we can drop this variable.
    17. lot_measure15: No clear relationship with price or any other feature.
    18. furnished: No clear relationship with price or any other feature. 2 unique values so can be converted to Categorical Variable
    19. total_area: No clear relationship with price. But it has Very Strong linear relationship with lot_measure. So one of it can be dropped.
    In brief, below featues should be converted to Categorical Variable
        ceil, coast, sight, condition, quality, yr_renovated, furnished
    And below columns can be dropped after checking pearson factor
        zipcode, lat, long, living_measure15, lot_measure15, total_area
    In [56]:
    # let's see corelatoin between the different features
    house_corr = house_df.corr(method ='pearson')
    house_corr
    
    Out[56]:
    cid price room_bed room_bath living_measure lot_measure ceil coast sight condition ... basement yr_built yr_renovated zipcode lat long living_measure15 lot_measure15 furnished total_area
    cid 1.000000 -0.016797 0.001286 0.005160 -0.012258 -0.132109 0.018525 -0.002721 0.011592 -0.023783 ... -0.005151 0.021380 -0.016907 -0.008224 -0.001891 0.020799 -0.002901 -0.138798 -0.010009 -0.131844
    price -0.016797 1.000000 0.308338 0.525134 0.702044 0.089655 0.256786 0.266331 0.397346 0.036392 ... 0.323837 0.053982 0.126442 -0.053168 0.306919 0.021571 0.585374 0.082456 0.565991 0.104796
    room_bed 0.001286 0.308338 1.000000 0.515884 0.576671 0.031703 0.175429 -0.006582 0.079532 0.028472 ... 0.303093 0.154178 0.018841 -0.152668 -0.008931 0.129473 0.391638 0.029244 0.259268 0.044310
    room_bath 0.005160 0.525134 0.515884 1.000000 0.754665 0.087740 0.500653 0.063744 0.187737 -0.124982 ... 0.283770 0.506019 0.050739 -0.203866 0.024573 0.223042 0.568634 0.087175 0.484923 0.104050
    living_measure -0.012258 0.702044 0.576671 0.754665 1.000000 0.172826 0.353949 0.103818 0.284611 -0.058753 ... 0.435043 0.318049 0.055363 -0.199430 0.052529 0.240223 0.756420 0.183286 0.632947 0.194209
    lot_measure -0.132109 0.089655 0.031703 0.087740 0.172826 1.000000 -0.005201 0.021604 0.074710 -0.008958 ... 0.015286 0.053080 0.007644 -0.129574 -0.085683 0.229521 0.144608 0.718557 0.118883 0.999763
    ceil 0.018525 0.256786 0.175429 0.500653 0.353949 -0.005201 1.000000 0.023698 0.029444 -0.263768 ... -0.245705 0.489319 0.006338 -0.059121 0.049614 0.125419 0.279885 -0.011269 0.347749 0.002637
    coast -0.002721 0.266331 -0.006582 0.063744 0.103818 0.021604 0.023698 1.000000 0.401857 0.016653 ... 0.080588 -0.026161 0.092885 0.030285 -0.014274 -0.041910 0.086463 0.030703 0.069882 0.023809
    sight 0.011592 0.397346 0.079532 0.187737 0.284611 0.074710 0.029444 0.401857 1.000000 0.045990 ... 0.276947 -0.053440 0.103917 0.084827 0.006157 -0.078400 0.280439 0.072575 0.220250 0.080693
    condition -0.023783 0.036392 0.028472 -0.124982 -0.058753 -0.008958 -0.263768 0.016653 0.045990 1.000000 ... 0.174105 -0.361417 -0.060618 0.003026 -0.014941 -0.106500 -0.092824 -0.003406 -0.121902 -0.010219
    quality 0.008130 0.667463 0.356967 0.664983 0.762704 0.113621 0.458183 0.082775 0.251321 -0.144674 ... 0.168392 0.446963 0.014414 -0.184862 0.114084 0.198372 0.713202 0.119248 0.788621 0.130004
    ceil_measure -0.010842 0.605566 0.477600 0.685342 0.876597 0.183512 0.523885 0.072075 0.167649 -0.158214 ... -0.051943 0.423898 0.023285 -0.261190 -0.000816 0.343803 0.731870 0.194050 0.652383 0.202127
    basement -0.005151 0.323837 0.303093 0.283770 0.435043 0.015286 -0.245705 0.080588 0.276947 0.174105 ... 1.000000 -0.133124 0.071323 0.074845 0.110538 -0.144765 0.200355 0.017276 0.092847 0.024832
    yr_built 0.021380 0.053982 0.154178 0.506019 0.318049 0.053080 0.489319 -0.026161 -0.053440 -0.361417 ... -0.133124 1.000000 -0.224874 -0.346869 -0.148122 0.409356 0.326229 0.070958 0.305225 0.059889
    yr_renovated -0.016907 0.126442 0.018841 0.050739 0.055363 0.007644 0.006338 0.092885 0.103917 -0.060618 ... 0.071323 -0.224874 1.000000 0.064357 0.029398 -0.068372 -0.002673 0.007854 0.017212 0.008835
    zipcode -0.008224 -0.053168 -0.152668 -0.203866 -0.199430 -0.129574 -0.059121 0.030285 0.084827 0.003026 ... 0.074845 -0.346869 0.064357 1.000000 0.267048 -0.564072 -0.279033 -0.147221 -0.138796 -0.133453
    lat -0.001891 0.306919 -0.008931 0.024573 0.052529 -0.085683 0.049614 -0.014274 0.006157 -0.014941 ... 0.110538 -0.148122 0.029398 0.267048 1.000000 -0.135512 0.048858 -0.086419 0.080952 -0.084175
    long 0.020799 0.021571 0.129473 0.223042 0.240223 0.229521 0.125419 -0.041910 -0.078400 -0.106500 ... -0.144765 0.409356 -0.068372 -0.564072 -0.135512 1.000000 0.334605 0.254451 0.187519 0.233896
    living_measure15 -0.002901 0.585374 0.391638 0.568634 0.756420 0.144608 0.279885 0.086463 0.280439 -0.092824 ... 0.200355 0.326229 -0.002673 -0.279033 0.048858 0.334605 1.000000 0.183192 0.620135 0.160727
    lot_measure15 -0.138798 0.082456 0.029244 0.087175 0.183286 0.718557 -0.011269 0.030703 0.072575 -0.003406 ... 0.017276 0.070958 0.007854 -0.147221 -0.086419 0.254451 0.183192 1.000000 0.129344 0.719692
    furnished -0.010009 0.565991 0.259268 0.484923 0.632947 0.118883 0.347749 0.069882 0.220250 -0.121902 ... 0.092847 0.305225 0.017212 -0.138796 0.080952 0.187519 0.620135 0.129344 1.000000 0.132379
    total_area -0.131844 0.104796 0.044310 0.104050 0.194209 0.999763 0.002637 0.023809 0.080693 -0.010219 ... 0.024832 0.059889 0.008835 -0.133453 -0.084175 0.233896 0.160727 0.719692 0.132379 1.000000

    22 rows × 22 columns

    We have linear relationships in below featues as we got to know from above matrix

    1. price: room_bath, living_measure, quality, living_measure15, furnished
    2. living_measure: price, room_bath. So we can consider dropping 'room_bath' variable.
    3. quality: price, room_bath, living_measure
    4. ceil_measure: price, room_bath, living_measure, quality
    5. living_measure15: price, living_measure, quality. So we can consider dropping living_measure15 as well. As it's giving same info as living_measure.
    6. lot_measure15: lot_measure. Therefore, we can consider dropping lot_measure15, as it's giving same info.
    7. furnished: quality
    8. total_area: lot_measure, lot_measure15. Therefore, we can consider dropping total_area feature as well. As it's giving same info as lot_measure.

    We can plot heatmap and can easily confirm our above findings

    In [57]:
    # Plotting heatmap
    plt.subplots(figsize =(15, 8)) 
    sns.heatmap(house_corr,cmap="YlGnBu",annot=True)
    
    Out[57]:
    <matplotlib.axes._subplots.AxesSubplot at 0x2258d4f79b0>

    Analyzing Bivariate for Feature: month_year

    In [58]:
    #month,year in which house is sold. Price is not influenced by it, though there are outliers and can be easily seen.
    house_df['month_year'] = pd.to_datetime(house_df['month_year'], format='%B-%Y')
    
    house_df.sort_values(["month_year"], axis=0, 
                     ascending=True, inplace=True) 
    house_df["month_year"] = house_df["month_year"].dt.strftime('%B-%Y')
    
    sns.factorplot(x='month_year',y='price',data=house_df, size=4, aspect=2)
    plt.xticks(rotation=90)
    #groupby
    house_df.groupby('month_year')['price'].agg(['mean','median','size'])
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    Out[58]:
    mean median size
    month_year
    April-2015 561933.463021 476500 2231
    August-2014 536527.039691 442100 1940
    December-2014 524602.893270 432500 1471
    February-2015 507919.603200 425545 1250
    January-2015 525963.251534 438500 978
    July-2014 544892.161013 465000 2211
    June-2014 558123.736239 465000 2180
    March-2015 544057.683200 450000 1875
    May-2014 548166.600113 465000 1768
    May-2015 558193.095975 455000 646
    November-2014 522058.861800 435000 1411
    October-2014 539127.477636 446900 1878
    September-2014 529315.868095 450000 1774

    The mean price of the houses tend to be high during March,April, May as compared to that of September, October, November,December period.

    Analyzing Bivariate for Feature: room_bed

    In [59]:
    #Room_bed - outliers can be seen easily. Mean and median of price increases with number bedrooms/house uptill a point
    #and then drops
    sns.factorplot(x='room_bed',y='price',data=house_df, size=4, aspect=2)
    
    #groupby
    house_df.groupby('room_bed')['price'].agg(['mean','median','size'])
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    Out[59]:
    mean median size
    room_bed
    0 4.102231e+05 288000.0 13
    1 3.176580e+05 299000.0 199
    2 4.013877e+05 374000.0 2760
    3 4.662766e+05 413000.0 9824
    4 6.355647e+05 549997.5 6882
    5 7.868741e+05 620000.0 1601
    6 8.258535e+05 650000.0 272
    7 9.514478e+05 728580.0 38
    8 1.105077e+06 700000.0 13
    9 8.939998e+05 817000.0 6
    10 8.200000e+05 660000.0 3
    11 5.200000e+05 520000.0 1
    33 6.400000e+05 640000.0 1

    There is clear increasing trend in price with room_bed

    In [60]:
    #room_bath - outliers can be seen easily. Overall mean and median price increares with increasing room_bath
    sns.factorplot(x='room_bath',y='price',data=house_df,size=4, aspect=2)
    plt.xticks(rotation=90)
    #groupby
    house_df.groupby('room_bath')['price'].agg(['mean','median','size'])
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    Out[60]:
    mean median size
    room_bath
    0.00 4.490950e+05 317500 10
    0.50 2.373750e+05 264000 4
    0.75 2.945209e+05 273500 72
    1.00 3.470412e+05 320000 3852
    1.25 6.217722e+05 516500 9
    1.50 4.093457e+05 370000 1446
    1.75 4.549158e+05 422900 3048
    2.00 4.579050e+05 423250 1930
    2.25 5.337688e+05 472500 2047
    2.50 5.536618e+05 499950 5380
    2.75 6.603505e+05 605000 1185
    3.00 7.086619e+05 600000 753
    3.25 9.707532e+05 835000 589
    3.50 9.324017e+05 820000 731
    3.75 1.198179e+06 1070000 155
    4.00 1.268405e+06 1055000 136
    4.25 1.526653e+06 1380000 79
    4.50 1.334211e+06 1060000 100
    4.75 2.022300e+06 2300000 23
    5.00 1.674167e+06 1430000 21
    5.25 1.817962e+06 1420000 13
    5.50 2.522500e+06 2340000 10
    5.75 2.492500e+06 1930000 4
    6.00 2.948333e+06 2895000 6
    6.25 3.095000e+06 3095000 2
    6.50 1.710000e+06 1710000 2
    6.75 2.735000e+06 2735000 2
    7.50 4.500000e+05 450000 1
    7.75 6.890000e+06 6890000 1
    8.00 4.990000e+06 4990000 2

    There is upward trend in price with increase in room_bath

    Analyzing Bivariate for Feature: living_measure

    In [61]:
    #living_measure - price increases with increase in living measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['living_measure'],house_df['price']))
    house_df['living_measure'].describe()
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    Out[61]:
    count    21613.000000
    mean      2079.899736
    std        918.440897
    min        290.000000
    25%       1427.000000
    50%       1910.000000
    75%       2550.000000
    max      13540.000000
    Name: living_measure, dtype: float64

    There is clear increment in price of the property with increment in the living measure But there seems to be one outlier to this trend. Need to evaluate the same

    Analyzing Bivariate for Feature: lot_measure

    In [62]:
    #lot_measure - there seems to be no relation between lot_measure and price
    #lot_measure - data value range is very large so breaking it get better view.
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['lot_measure'],house_df['price']))
    house_df['lot_measure'].describe()
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    Out[62]:
    count    2.161300e+04
    mean     1.510697e+04
    std      4.142051e+04
    min      5.200000e+02
    25%      5.040000e+03
    50%      7.618000e+03
    75%      1.068800e+04
    max      1.651359e+06
    Name: lot_measure, dtype: float64

    There doesnt seem to be no relation between lot_measure and price trend

    In [63]:
    #lot_measure <25000
    plt.figure(figsize=(plotSizeX, plotSizeY))
    x=house_df[house_df['lot_measure']<25000]
    print(sns.scatterplot(x['lot_measure'],x['price']))
    x['lot_measure'].describe()
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    Out[63]:
    count    19713.000000
    mean      7762.510577
    std       4252.549162
    min        520.000000
    25%       4997.000000
    50%       7253.000000
    75%       9620.000000
    max      24969.000000
    Name: lot_measure, dtype: float64

    Almost 95% of the houses have <25000 lot_measure. But there is no clear trend between lot_measure and price

    In [64]:
    #lot_measure >100000 - price increases with increase in living measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    y=house_df[house_df['lot_measure']<=75000]
    print(sns.scatterplot(y['lot_measure'],y['price']))
    #y['lot_measure'].describe()
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    

    Analyzing Bivariate for Feature: ceil

    In [65]:
    #ceil - median price increases initially and then falls
    print(sns.factorplot(x='ceil',y='price',data=house_df, size = 4, aspect = 2))
    #groupby
    house_df.groupby('ceil')['price'].agg(['mean','median','size'])
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    <seaborn.axisgrid.FacetGrid object at 0x000002259321B9B0>
    
    Out[65]:
    mean median size
    ceil
    1.0 4.422196e+05 390000 10680
    1.5 5.590449e+05 524475 1910
    2.0 6.490515e+05 542950 8241
    2.5 1.061021e+06 799200 161
    3.0 5.826201e+05 490000 613
    3.5 9.339375e+05 534500 8

    There is some slight upward trend in price with the ceil

    Analyzing Bivariate for Feature: coast

    In [66]:
    #coast - mean and median of waterfront view is high however such houses are very small in compare to non-waterfront
    #Also, living_measure mean and median is greater for waterfront house.
    print(sns.factorplot(x='coast',y='price',data=house_df, size = 4, aspect = 2))
    #groupby
    house_df.groupby('coast')['living_measure','price'].agg(['median','mean'])
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    <seaborn.axisgrid.FacetGrid object at 0x0000022580B62208>
    
    Out[66]:
    living_measure price
    median mean median mean
    coast
    0 1910 2071.587972 450000 5.316534e+05
    1 2850 3173.687117 1400000 1.662524e+06

    The house properties with water_front tend to have higher price compared to that of non-water_front properties

    Analyzing Bivariate for Feature: sight

    In [67]:
    #sight - have outliers. The house sighted more have high price (mean and median) and have large living area as well.
    print(sns.factorplot(x='sight',y='price',data=house_df, size = 4, aspect = 2))
    #groupby
    house_df.groupby('sight')['price','living_measure'].agg(['mean','median','size'])
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    <seaborn.axisgrid.FacetGrid object at 0x00000225960E3080>
    
    Out[67]:
    price living_measure
    mean median size mean median size
    sight
    0 4.966235e+05 432500 19489 1997.761660 1850 19489
    1 8.125186e+05 690944 332 2568.960843 2420 332
    2 7.927462e+05 675000 963 2655.257529 2470 963
    3 9.724684e+05 802500 510 3018.564706 2840 510
    4 1.464363e+06 1190000 319 3351.473354 3050 319

    Properties with higher price have more no.of sights compared to that of houses with lower price

    In [68]:
    #Sight - Viewed in relation with price and living_measure
    #Costlier houses with large living area are sighted more.
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['sight'],palette='Paired',legend='full'))
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    

    The above graph also justify that: Properties with higher price have more no.of sights compared to that of houses with lower price

    Analyzing Bivariate for Feature: condition

    In [69]:
    #condition - as the condition rating increases its price and living measure mean and median also increases.
    print(sns.factorplot(x='condition',y='price',data=house_df, size = 4, aspect = 2))
    #groupby
    house_df.groupby('condition')['price','living_measure'].agg(['mean','median','size'])
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    <seaborn.axisgrid.FacetGrid object at 0x00000225FFCB87F0>
    
    Out[69]:
    price living_measure
    mean median size mean median size
    condition
    1 334431.666667 262500 30 1216.000000 1000 30
    2 327316.215116 279000 172 1410.058140 1320 172
    3 542097.086024 450000 14031 2149.042050 1970 14031
    4 521300.705230 440000 5679 1950.991724 1820 5679
    5 612577.742504 526000 1701 2022.911229 1880 1701

    The price of the house increases with condition rating of the house

    In [70]:
    #Condition - Viewed in relation with price and living_measure. Most houses are rated as 3 or more. 
    #We can see some outliers as well
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['condition'],palette='Paired',legend='full'))
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    

    So we found out that smaller houses are in better condition and better condition houses are having higher prices

    Analyzing Bivariate for Feature: quality

    In [71]:
    #quality - with grade increase price and living_measure increase (mean and median)
    
    print(sns.factorplot(x='quality',y='price',data=house_df, size = 4, aspect = 2))
    #groupby
    house_df.groupby('quality')['price','living_measure'].agg(['mean','median','size'])
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    <seaborn.axisgrid.FacetGrid object at 0x000002258C021320>
    
    Out[71]:
    price living_measure
    mean median size mean median size
    quality
    1 1.420000e+05 142000.0 1 290.000000 290 1
    3 2.056667e+05 262000.0 3 596.666667 600 3
    4 2.143810e+05 205000.0 29 660.482759 660 29
    5 2.485240e+05 228700.0 242 983.326446 905 242
    6 3.019166e+05 275276.5 2038 1191.561335 1120 2038
    7 4.025933e+05 375000.0 8981 1689.400401 1630 8981
    8 5.428955e+05 510000.0 6068 2184.748517 2150 6068
    9 7.737382e+05 720000.0 2615 2868.139962 2820 2615
    10 1.072347e+06 914327.0 1134 3520.299824 3450 1134
    11 1.497792e+06 1280000.0 399 4395.448622 4260 399
    12 2.192500e+06 1820000.0 90 5471.588889 4965 90
    13 3.710769e+06 2980000.0 13 7483.076923 7100 13

    There is clear increase in price of the house with higher rating on quality

    In [72]:
    #quality - Viewed in relation with price and living_measure. Most houses are graded as 6 or more. 
    #We can see some outliers as well
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['quality'],palette='coolwarm_r',
                          legend='full'))
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    

    Analyzing Bivariate for Feature: ceil_measure

    In [73]:
    #ceil_measure - price increases with increase in ceil measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['ceil_measure'],house_df['price']))
    house_df['ceil_measure'].describe()
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    Out[73]:
    count    21613.000000
    mean      1788.390691
    std        828.090978
    min        290.000000
    25%       1190.000000
    50%       1560.000000
    75%       2210.000000
    max       9410.000000
    Name: ceil_measure, dtype: float64

    There is upward trend in price with ceil_measure

    Analyzing Bivariate for Feature: basement

    In [74]:
    #basement - price increases with increase in ceil measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['basement'],house_df['price']))
    house_df['basement'].describe()
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    Out[74]:
    count    21613.000000
    mean       291.509045
    std        442.575043
    min          0.000000
    25%          0.000000
    50%          0.000000
    75%        560.000000
    max       4820.000000
    Name: basement, dtype: float64

    We will create the categorical variable for basement 'has_basement' for houses with basement and no basement.This categorical variable will be used for further analysis.

    In [75]:
    #Binning Basement to analyse data
    def create_basement_group(series):
        if series == 0:
            return "No"
        elif series > 0:
            return "Yes"
        
    house_df['has_basement'] = house_df['basement'].apply(create_basement_group)
    
    In [76]:
    #basement - after binning we data shows with basement houses are costlier and have higher 
    #living measure (mean & median)
    print(sns.factorplot(x='has_basement',y='price',data=house_df, size = 4, aspect = 2))
    house_df.groupby('has_basement')['price','living_measure'].agg(['mean','median','size'])
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    <seaborn.axisgrid.FacetGrid object at 0x0000022580B9A470>
    
    Out[76]:
    price living_measure
    mean median size mean median size
    has_basement
    No 486945.394789 411500 13126 1928.879628 1740 13126
    Yes 622518.174384 515000 8487 2313.467539 2100 8487

    The houses with basement has better price compared to that of houses without basement

    In [77]:
    #basement - have higher price & living measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['has_basement']))
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    In [78]:
    #yr_built - outliers can be seen easily.
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['yr_built'],house_df['living_measure']))
    #groupby
    house_df.groupby('yr_built')['price'].agg(['mean','median','size'])
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    Out[78]:
    mean median size
    yr_built
    1900 581536.632184 549000 87
    1901 557108.344828 550000 29
    1902 673192.592593 624000 27
    1903 480958.195652 461000 46
    1904 583867.755556 478000 45
    1905 753443.932432 597500 74
    1906 670027.663043 555000 92
    1907 676324.476923 595000 65
    1908 564499.848837 519475 86
    1909 696448.989362 575500 94
    1910 671671.835821 542500 134
    1911 632584.246575 606000 73
    1912 613193.227848 557510 79
    1913 586066.271186 535000 59
    1914 615246.074074 553300 54
    1915 585036.921875 549500 64
    1916 601041.620253 515000 79
    1917 528126.785714 450000 56
    1918 492346.875000 412450 120
    1919 537887.556818 487900 88
    1920 477761.030612 448500 98
    1921 613224.210526 547500 76
    1922 569794.147368 515000 95
    1923 618653.773810 498376 84
    1924 570419.928058 525000 139
    1925 607316.606061 535000 165
    1926 625443.377778 560000 180
    1927 654154.208696 605000 115
    1928 621920.198413 547500 126
    1929 574396.842105 523475 114
    ... ... ... ...
    1986 476989.069767 419500 215
    1987 517565.010204 471500 294
    1988 583930.400000 500000 270
    1989 583063.403448 490000 290
    1990 564133.384375 457500 320
    1991 630630.647321 534150 224
    1992 548205.924242 472500 198
    1993 556760.455446 435000 202
    1994 486864.040161 439000 249
    1995 577933.757396 496000 169
    1996 639673.528205 540000 195
    1997 606173.887006 515000 177
    1998 594280.146444 500000 239
    1999 640431.177358 499900 265
    2000 682003.619266 544250 218
    2001 741340.042623 585000 305
    2002 578818.481982 447500 222
    2003 558791.367299 450500 422
    2004 596095.004619 507000 433
    2005 580895.468889 486000 450
    2006 631041.548458 510500 454
    2007 615193.292566 480000 417
    2008 642037.716621 500000 367
    2009 518462.186957 416375 230
    2010 551678.384615 448500 143
    2011 544648.384615 440000 130
    2012 527436.982353 448475 170
    2013 678599.582090 565000 201
    2014 683792.685152 599000 559
    2015 759970.947368 629500 38

    116 rows × 3 columns

    We will create new variable: Houselandratio - This is proportion of living area in the total area of the house. We will explore the trend of price against this houselandratio.

    In [79]:
    #HouseLandRatio - Computing new variable as ratio of living_measure/total_area
    #Significes - Land used for construction of house
    house_df["HouseLandRatio"]=np.round((house_df['living_measure']/house_df['total_area']),2)*100
    house_df["HouseLandRatio"].head()
    
    Out[79]:
    17786    19.0
    3782     16.0
    10069    16.0
    7114     24.0
    10080    22.0
    Name: HouseLandRatio, dtype: float64

    Analyzing Bivariate for Feature: yr_renovated

    In [80]:
    #yr_renovated - 
    plt.figure(figsize=(plotSizeX, plotSizeY))
    x=house_df[house_df['yr_renovated']>0]
    print(sns.scatterplot(x['yr_renovated'],x['price']))
    #groupby
    x.groupby('yr_renovated')['price'].agg(['mean','median','size'])
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    Out[80]:
    mean median size
    yr_renovated
    1934 4.599500e+05 459950.0 1
    1940 3.784000e+05 378400.0 2
    1944 5.210000e+05 521000.0 1
    1945 3.986667e+05 375000.0 3
    1946 3.511375e+05 351137.5 2
    1948 4.100000e+05 410000.0 1
    1950 2.914500e+05 291450.0 2
    1951 2.760000e+05 276000.0 1
    1953 2.458167e+05 247500.0 3
    1954 9.000000e+05 900000.0 1
    1955 4.421667e+05 399000.0 3
    1956 9.306667e+05 1140000.0 3
    1957 2.915333e+05 249900.0 3
    1958 5.595760e+05 397380.0 5
    1959 3.975000e+05 397500.0 1
    1960 4.771750e+05 299350.0 4
    1962 6.150000e+05 615000.0 2
    1963 4.977125e+05 402500.0 4
    1964 3.567200e+05 325000.0 5
    1965 7.822000e+05 580000.0 5
    1967 2.686000e+05 268600.0 2
    1968 4.835125e+05 425000.0 8
    1969 5.291250e+05 555750.0 4
    1970 5.230444e+05 450000.0 9
    1971 4.182775e+05 418277.5 2
    1972 6.197500e+05 522000.0 4
    1973 4.172000e+05 440000.0 5
    1974 4.025000e+05 310000.0 3
    1975 5.052500e+05 521750.0 6
    1976 4.016667e+05 335000.0 3
    ... ... ... ...
    1986 6.230582e+05 520000.0 17
    1987 1.206778e+06 624000.0 18
    1988 7.227600e+05 588000.0 15
    1989 6.397886e+05 560000.0 22
    1990 7.491200e+05 730000.0 25
    1991 9.650450e+05 792500.0 20
    1992 6.967941e+05 599000.0 17
    1993 8.480032e+05 805000.0 19
    1994 9.430265e+05 780000.0 19
    1995 8.055231e+05 536475.0 16
    1996 7.496633e+05 710000.0 15
    1997 6.203960e+05 569950.0 15
    1998 7.737316e+05 526000.0 19
    1999 1.030706e+06 840000.0 17
    2000 8.090843e+05 755000.0 35
    2001 1.089489e+06 675000.0 19
    2002 1.216498e+06 890000.0 22
    2003 9.923056e+05 767500.0 36
    2004 7.820769e+05 721250.0 26
    2005 8.151957e+05 744000.0 35
    2006 7.890396e+05 654050.0 24
    2007 8.389221e+05 797000.0 35
    2008 1.034499e+06 801500.0 18
    2009 9.006824e+05 521000.0 22
    2010 9.926694e+05 845000.0 18
    2011 6.074962e+05 577000.0 13
    2012 6.251818e+05 515000.0 11
    2013 6.649608e+05 560000.0 37
    2014 6.550301e+05 575000.0 91
    2015 6.591562e+05 651000.0 16

    69 rows × 3 columns

    So most houses are renovated after 1980's. We will create new categorical variable 'has_renovated' to categorize the property as renovated and non-renovated. For further ananlysis we will use this categorical variable.

    In [81]:
    #Lets try to group yr_renovated
    #Binning Basement to analyse data
    def create_renovated_group(series):
        if series == 0:
            return "No"
        elif series > 0:
            return "Yes"
        
    house_df['has_renovated'] = house_df['yr_renovated'].apply(create_renovated_group)
    
    In [84]:
    #has_renovated - renovated have higher mean and median, however it does not confirm if the prices of house renovated 
    #actually increased or not.
    #HouseLandRatio - Renovated house utilized more land area for construction of house
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['has_renovated']))
    #groupby
    house_df.groupby(['has_renovated'])['price','HouseLandRatio'].agg(['mean','median','size'])
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    Out[84]:
    price HouseLandRatio
    mean median size mean median size
    has_renovated
    No 530447.958597 448000 20699 22.067056 20.0 20699
    Yes 760628.777899 600000 914 22.296499 21.0 914

    Renovated properties have higher price than others with same living measure space.

    In [85]:
    #pd.crosstab(house_df['yearbuilt_group'],house_df['has_renovated'])
    
    In [86]:
    #has_renovated - have higher price & living measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    x=house_df[house_df['yr_built']<2000]
    print(sns.scatterplot(x['living_measure'],x['price'],hue=x['has_renovated']))
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    

    Analyzing Bivariate for Feature: furnished

    In [87]:
    #furnished - Furnished has higher price value and has greater living_measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['furnished']))
    #groupby
    house_df.groupby('furnished')['price','living_measure','HouseLandRatio'].agg(['mean','median','size'])
    
    AxesSubplot(0.125,0.125;0.775x0.755)
    
    Out[87]:
    price living_measure HouseLandRatio
    mean median size mean median size mean median size
    furnished
    0 437300.158968 401000 17362 1792.256652 1720 17362 21.508236 19.0 17362
    1 960374.414961 810000 4251 3254.696072 3110 4251 24.398730 24.0 4251

    Furnished houses have higher price than that of the Non-furnished houses

    Analyzing Bivariate for Feature: city

    In [88]:
    #City - outliers can be seen easily.
    
    print(sns.factorplot(x='City',y='price',data=house_df, size = 4, aspect = 2))
    plt.xticks(rotation=90)
    #groupby
    house_df.groupby('City')['price'].agg(['mean','median','size']).sort_values(by='median',ascending=False)
    
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
      warnings.warn(msg)
    C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` paramter has been renamed to `height`; please update your code.
      warnings.warn(msg, UserWarning)
    
    <seaborn.axisgrid.FacetGrid object at 0x0000022593C63C88>
    
    Out[88]:
    mean median size
    City
    Medina 2.161300e+06 1895000.0 50
    Mercer Island 1.194874e+06 993750.0 282
    Bellevue 8.984661e+05 749000.0 1407
    Sammamish 7.328210e+05 688500.0 800
    Redmond 6.589089e+05 625000.0 979
    Issaquah 6.151222e+05 572000.0 733
    Woodinville 6.174979e+05 570000.0 471
    Kirkland 6.465428e+05 510000.0 977
    Snoqualmie 5.280031e+05 500000.0 310
    Bothell 4.903771e+05 470000.0 195
    Vashon 4.874805e+05 463750.0 118
    Fall City 5.806379e+05 460000.0 81
    Seattle 5.350695e+05 453000.0 8977
    Kenmore 4.624889e+05 445000.0 283
    Carnation 4.556171e+05 415000.0 124
    Duvall 4.248151e+05 401250.0 190
    North Bend 4.395073e+05 399500.0 221
    Black Diamond 4.236660e+05 359999.5 100
    Renton 4.034685e+05 358000.0 1597
    Maple Valley 3.668761e+05 342000.0 590
    Kent 2.995499e+05 283200.0 1203
    Enumclaw 3.157093e+05 279500.0 234
    Auburn 2.914815e+05 270000.0 912
    Federal Way 2.893913e+05 268000.0 779

    From the above graph, few cities have higher average price of the houses compared to others. We need to further analyse why the price varies among cities.

    In [89]:
    #City mean price distribution with average
    city_price=pd.DataFrame(house_df.groupby('City')['price'].agg(['mean','median','size']))
    
    indx=city_price.index
    overall_price_mean=np.mean(house_df['price'])
    overall_price_median=np.median(house_df['price'])
    
    fig, ax1 = plt.subplots(figsize=(plotSizeX, plotSizeY))
    barlist=ax1.bar(city_price.index,city_price['mean'],color='gray')
    plt.xticks(rotation=90)
    ax1.axhline(overall_price_mean, color="red")
    ax1.text(1.02, overall_price_mean, "{0:.2f}".format(round(overall_price_mean,2)), va='center', ha="left", bbox=dict(facecolor="w",alpha=0.5),
            transform=ax1.get_yaxis_transform())
    plt.title("Cities and Mean Price")
    plt.show()
    

    As we can see from above grapgh, majorly below cities have higher mean house prices

    1. Bellevue
    2. Fall City
    3. Federal Way
    4. Kirkland
    5. Medina
    6. Mercer Island
    7. Redmond
    8. Sammanmish
    9. Woodinville
    In [90]:
    #City median price distribution with average
    fig, ax1 = plt.subplots(figsize=(plotSizeX, plotSizeY))
    barlist=ax1.bar(city_price.index,city_price['median'],color='green')
    plt.xticks(rotation=90)
    ax1.axhline(overall_price_median, color="red")
    ax1.text(1.02, overall_price_median, "{0:.2f}".format(round(overall_price_median,2)), va='center', ha="left", bbox=dict(facecolor="w",alpha=0.5),
            transform=ax1.get_yaxis_transform())
    
    plt.title("Cities and Median Price")
    plt.show()
    

    As we can see from above grapgh, majorly below cities have higher median house prices

    1. Bellevue
    2. Bothell
    3. Issaquah
    4. Kirkland
    5. Medina
    6. Mercer Island
    7. Redmond
    8. Sammanmish
    9. Snoqualmie
    10. Woodinville
    In [91]:
    #let's make the copy of the dataframe, before making any furhter changes
    house_df_bdp=house_df.copy()
    

    DATA PROCESSING

    Treating Outlilers

    We have seen outliers for columns room_bath(33 bed), living_measure, lot_measure, ceil_measure and Basement

    In [92]:
    def outlier_treatment(datacolumn):
        sorted(datacolumn)
        Q1,Q3 = np.percentile(datacolumn , [25,75])
        IQR = Q3-Q1
        lower_range = Q1-(1.5 * IQR)
        upper_range = Q3+(1.5 * IQR)
        return lower_range,upper_range
    

    Using the above function, lets get the lowerbound and upperbound values

    Treating outliers for column - ceil_measure

    In [93]:
    lowerbound,upperbound = outlier_treatment(house_df.ceil_measure)
    print(lowerbound,upperbound)
    
    -340.0 3740.0
    

    Lets check which column is considered as an outlier

    In [94]:
    house_df[(house_df.ceil_measure < lowerbound) | (house_df.ceil_measure > upperbound)]
    
    Out[94]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_year City County Type has_basement HouseLandRatio has_renovated
    7142 7397300220 2014-05-29 2750000 4 3.25 4430 21000 2.0 0 0 ... 20000 1 25430 May-2014 Medina King Standard No 17.0 Yes
    10270 3221059044 2014-05-23 799950 4 3.50 4220 196817 2.0 0 0 ... 195395 1 201037 May-2014 Auburn King Standard No 2.0 No
    9770 2424049029 2014-05-29 3100000 6 4.25 6980 15682 3.0 0 4 ... 18367 1 22662 May-2014 Mercer Island King Standard Yes 31.0 No
    9909 1724069059 2014-05-24 2000000 5 4.00 4580 4443 3.0 1 4 ... 4443 1 9023 May-2014 Sammamish King Standard No 51.0 No
    3712 9359100500 2014-05-27 1800000 4 3.25 4060 13000 2.0 0 3 ... 13800 1 17060 May-2014 Mercer Island King Standard No 24.0 No
    3628 4131900042 2014-05-16 2000000 5 4.25 6490 10862 2.0 0 3 ... 14080 1 17352 May-2014 Mercer Island King Standard Yes 37.0 No
    18900 3521059134 2014-05-23 900000 3 3.50 4080 217697 1.5 0 3 ... 217790 1 221777 May-2014 Auburn King Standard No 2.0 No
    13664 2481620310 2014-05-14 1120000 4 2.25 4470 60373 2.0 0 0 ... 40450 1 64843 May-2014 Woodinville King Standard No 7.0 No
    20740 1225069038 2014-05-05 2280000 7 8.00 13540 307752 3.0 0 4 ... 217800 1 321292 May-2014 Redmond King Standard Yes 4.0 No
    10672 3892500150 2014-05-21 1550000 3 2.50 4460 26027 2.0 0 0 ... 26027 1 30487 May-2014 Kirkland King Standard No 15.0 No
    3964 1829300210 2014-05-06 762300 4 2.50 3880 14550 2.0 0 0 ... 14045 1 18430 May-2014 Sammamish King Standard No 21.0 No
    10294 525069127 2014-05-23 1200000 4 3.50 4740 172497 2.0 0 0 ... 49658 1 177237 May-2014 Redmond King Standard No 3.0 No
    3996 3630200780 2014-05-22 1050000 4 3.75 3860 5474 2.5 0 0 ... 5474 1 9334 May-2014 Issaquah King Standard No 41.0 No
    10540 2524069097 2014-05-09 2240000 5 6.50 7270 130017 2.0 0 0 ... 44890 1 137287 May-2014 Issaquah King Standard Yes 5.0 No
    13827 824059042 2014-05-30 1890000 5 3.50 4180 17935 2.0 0 0 ... 13760 1 22115 May-2014 Bellevue King Standard No 19.0 No
    10462 7237550130 2014-05-20 1300000 4 3.50 4380 74052 1.0 0 0 ... 62291 1 78432 May-2014 Redmond King Standard No 6.0 No
    3089 3616600250 2014-05-27 1600000 3 3.25 3790 19000 2.0 0 4 ... 18628 1 22790 May-2014 Seattle King Standard No 17.0 No
    15646 98000960 2014-05-13 1050000 4 3.25 4400 16625 2.0 0 0 ... 15523 1 21025 May-2014 Sammamish King Standard No 21.0 No
    8153 425069020 2014-05-05 1090000 4 2.50 4340 141570 2.5 0 0 ... 97138 1 145910 May-2014 Redmond King Standard No 3.0 No
    8484 4039800080 2014-05-29 1360000 5 3.50 5960 13703 2.0 0 2 ... 17320 1 19663 May-2014 Bellevue King Standard Yes 30.0 No
    15135 3625700010 2014-05-06 1870000 5 4.00 4510 15175 2.0 0 0 ... 13500 1 19685 May-2014 Mercer Island King Standard No 23.0 Yes
    8769 2524049318 2014-05-28 2000000 4 3.00 4260 18000 2.0 0 2 ... 17015 1 22260 May-2014 Mercer Island King Standard No 19.0 No
    15404 3276940100 2014-05-22 1000000 4 3.00 4260 18687 2.0 0 0 ... 16772 1 22947 May-2014 Sammamish King Standard No 19.0 No
    8358 4100500070 2014-05-27 1710000 5 4.50 4590 14685 2.0 0 0 ... 9486 1 19275 May-2014 Kirkland King Standard No 24.0 No
    9507 8691310840 2014-05-09 833000 4 2.75 3780 10308 2.0 0 0 ... 10740 1 14088 May-2014 Sammamish King Standard No 27.0 No
    7743 6613000935 2014-05-13 2560000 4 2.50 5300 26211 2.0 1 2 ... 19281 1 31511 May-2014 Seattle King Standard Yes 17.0 No
    3330 5710000005 2014-05-22 2150000 4 5.50 5060 10320 2.0 0 0 ... 10080 1 15380 May-2014 Bellevue King Standard No 33.0 No
    19115 6648150040 2014-05-13 1680000 5 3.25 4860 23723 2.0 0 2 ... 13860 1 28583 May-2014 Mercer Island King Standard Yes 17.0 No
    15697 3758900259 2014-05-07 1040000 4 3.50 3900 8391 2.0 0 0 ... 12268 1 12291 May-2014 Kirkland King Standard No 32.0 No
    3187 1853080640 2014-05-14 966000 5 4.50 3810 8019 2.0 0 0 ... 7713 1 11829 May-2014 Sammamish King Standard No 32.0 No
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    3232 7135520300 2015-04-07 1300000 3 2.75 4120 16365 1.0 0 2 ... 14110 1 20485 April-2015 Renton King Standard No 20.0 No
    15540 98000740 2015-04-01 945000 5 3.50 4380 14925 2.0 0 0 ... 14633 1 19305 April-2015 Sammamish King Standard No 23.0 No
    15481 2726059144 2015-04-10 1040000 5 3.75 4570 10194 2.0 0 0 ... 7560 1 14764 April-2015 Kirkland King Standard No 31.0 No
    8270 3401700150 2015-04-23 1350000 5 3.00 5530 38816 1.5 0 2 ... 44417 1 44346 April-2015 Woodinville King Standard No 12.0 Yes
    19626 3295610080 2015-04-01 912000 4 2.75 4030 10888 2.0 0 0 ... 10756 1 14918 April-2015 Sammamish King Standard No 27.0 No
    14955 3121500150 2015-04-23 894000 4 2.50 3800 22029 2.0 0 0 ... 24979 1 25829 April-2015 Redmond King Standard No 15.0 No
    15084 2625069070 2015-04-10 1390000 4 3.25 4860 181319 2.5 0 0 ... 181319 1 186179 April-2015 Sammamish King Standard No 3.0 No
    2878 644000040 2015-04-29 1780000 4 3.25 3950 10912 2.0 0 0 ... 10998 1 14862 April-2015 Bellevue King Standard No 27.0 No
    8561 3585900500 2015-04-02 1530000 4 4.25 4720 21000 3.0 0 4 ... 20000 1 25720 April-2015 Seattle King Standard No 18.0 No
    8682 3860900035 2015-04-15 1940000 5 3.50 4230 16526 2.0 0 0 ... 12362 1 20756 April-2015 Bellevue King Standard No 20.0 No
    15893 98300230 2015-04-28 1460000 4 4.00 4620 130208 2.0 0 0 ... 131007 1 134828 April-2015 Fall City King Standard No 3.0 No
    7749 6790830090 2015-04-15 1060000 4 3.50 4220 8417 3.0 0 0 ... 8435 1 12637 April-2015 Sammamish King Standard No 33.0 No
    7941 2481630030 2015-04-27 965000 4 2.50 3920 41206 2.0 0 0 ... 36562 1 45126 April-2015 Woodinville King Standard No 9.0 No
    15862 7237550110 2015-04-24 1180000 4 3.25 3750 74052 2.0 0 0 ... 74052 1 77802 April-2015 Redmond King Standard No 5.0 No
    19172 713500020 2015-04-21 1390000 4 4.50 4490 24767 2.0 0 2 ... 32700 1 29257 April-2015 Bellevue King Standard Yes 15.0 No
    7847 7853440140 2015-04-09 802945 5 3.50 4000 9234 2.0 0 0 ... 6600 1 13234 April-2015 Fall City King Standard No 30.0 No
    13999 1126059201 2015-05-04 1270000 5 3.25 4410 35192 2.0 0 2 ... 59677 1 39602 May-2015 Woodinville King Standard Yes 11.0 No
    9320 1525059261 2015-05-05 1900000 5 4.50 5160 44315 2.0 0 0 ... 44315 1 49475 May-2015 Bellevue King Standard No 10.0 No
    2730 7853440050 2015-05-05 771005 5 4.50 4000 6713 2.0 0 0 ... 6600 1 10713 May-2015 Fall City King Standard No 37.0 No
    5687 3751600409 2015-05-08 510000 4 2.50 4073 17334 2.0 0 0 ... 9625 0 21407 May-2015 Auburn King Standard No 19.0 No
    5620 6065300370 2015-05-06 4210000 5 6.00 7440 21540 2.0 0 0 ... 19329 1 28980 May-2015 Bellevue King Standard Yes 26.0 No
    21004 3303960250 2015-05-07 1050000 4 3.25 4020 11588 2.0 0 0 ... 8066 1 15608 May-2015 Renton King Standard No 26.0 No
    15596 1925059254 2015-05-07 3000000 5 4.00 6670 16481 2.0 0 0 ... 16607 1 23151 May-2015 Bellevue King Standard Yes 29.0 No
    13440 1623089165 2015-05-06 920000 4 3.75 4030 503989 2.0 0 0 ... 71874 1 508019 May-2015 North Bend King Standard No 1.0 No
    15586 1266200140 2015-05-06 1850000 4 3.25 4160 10335 2.0 0 0 ... 10333 1 14495 May-2015 Bellevue King Standard No 29.0 No
    9588 7237501380 2015-05-07 1270000 4 3.50 4640 13404 2.0 0 0 ... 13590 1 18044 May-2015 Renton King Standard No 26.0 No
    17098 2424059174 2015-05-08 2000000 4 3.25 5640 35006 2.0 0 2 ... 35033 1 40646 May-2015 Bellevue King Standard Yes 14.0 No
    13099 3024059057 2015-05-01 1650000 4 4.50 5550 16065 2.0 0 0 ... 16488 1 21615 May-2015 Mercer Island King Standard Yes 26.0 No
    19121 4389201095 2015-05-11 3650000 5 3.75 5020 8694 2.0 0 1 ... 11275 1 13714 May-2015 Bellevue King Standard Yes 37.0 No
    13112 7960900060 2015-05-04 2900000 4 3.25 5050 20100 1.5 0 2 ... 20060 1 25150 May-2015 Bellevue King Standard Yes 20.0 Yes

    611 rows × 30 columns

    We got 611 records which are outliers

    In [95]:
    #dropping the record from the dataset
    house_df.drop(house_df[ (house_df.ceil_measure > upperbound) | (house_df.ceil_measure < lowerbound) ].index, inplace=True)
    
    In [96]:
    house_df.shape
    
    Out[96]:
    (21002, 30)
    In [97]:
    #ceil_measure
    print("Skewness is :", house_df.ceil_measure.skew())
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.distplot(house_df.ceil_measure)
    house_df.ceil_measure.describe()
    
    Skewness is : 0.8198869256569326
    
    Out[97]:
    count    21002.000000
    mean      1712.238168
    std        696.044073
    min        290.000000
    25%       1180.000000
    50%       1540.000000
    75%       2140.000000
    max       3740.000000
    Name: ceil_measure, dtype: float64

    After treating outliers of ceil_measure, the data has reduced by about 600(~3%) data points but data is nicely distributed

    Treating outliers for column - basement

    In [98]:
    lowerbound_base,upperbound_base = outlier_treatment(house_df.basement)
    print(lowerbound_base,upperbound_base)
    
    -855.0 1425.0
    
    In [99]:
    house_df[(house_df.basement < lowerbound_base) | (house_df.basement > upperbound_base)]
    
    Out[99]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_year City County Type has_basement HouseLandRatio has_renovated
    16357 3211270170 2014-05-23 404000 4 3.00 4060 35621 1.0 0 0 ... 35259 1 39681 May-2014 Auburn King Standard Yes 10.0 No
    7386 5700003640 2014-05-19 2100000 5 3.75 5340 10655 2.5 0 3 ... 9418 1 15995 May-2014 Seattle King Standard Yes 33.0 No
    9727 5119010090 2014-05-10 549900 5 2.75 3060 7015 1.0 0 0 ... 7600 0 10075 May-2014 Seattle King Standard Yes 30.0 No
    16069 7663700968 2014-05-28 565000 7 4.50 4140 9066 1.0 0 0 ... 1865 0 13206 May-2014 Seattle King Standard Yes 31.0 No
    1783 7430200100 2014-05-14 1220000 4 3.50 4910 9444 1.5 0 0 ... 11063 1 14354 May-2014 Sammamish King Standard Yes 34.0 No
    1145 7856410030 2014-05-05 1030000 5 2.75 3190 16920 1.0 0 3 ... 13100 1 20110 May-2014 Bellevue King Standard Yes 16.0 No
    13624 7855801610 2014-05-19 1220000 4 2.50 3190 8684 1.0 0 3 ... 8684 1 11874 May-2014 Bellevue King Standard Yes 27.0 No
    6610 1424059154 2014-05-16 1270000 4 3.00 5520 8313 2.0 0 3 ... 8278 1 13833 May-2014 Bellevue King Standard Yes 40.0 No
    13951 9322800210 2014-05-20 879950 4 2.25 3500 13875 1.0 0 4 ... 15000 1 17375 May-2014 Seattle King Standard Yes 20.0 No
    13757 4219401236 2014-05-20 1690000 3 1.75 3400 8965 1.0 0 2 ... 8500 1 12365 May-2014 Seattle King Standard Yes 27.0 No
    10529 7784400130 2014-05-05 497300 6 2.75 3200 9200 1.0 0 2 ... 9500 0 12400 May-2014 Seattle King Standard Yes 26.0 No
    6832 486000510 2014-05-23 1330000 4 3.00 3370 7920 1.0 0 3 ... 7380 1 11290 May-2014 Seattle King Standard Yes 30.0 No
    2479 1624049293 2014-05-06 390000 5 3.75 2890 5000 1.0 0 0 ... 5117 0 7890 May-2014 Seattle King Standard Yes 37.0 No
    15539 7855200120 2014-05-09 1370000 4 2.75 3720 9450 1.0 0 4 ... 8605 1 13170 May-2014 Bellevue King Standard Yes 28.0 No
    2752 7922900040 2014-05-22 1080000 4 3.00 3600 9200 1.0 0 4 ... 9775 1 12800 May-2014 Bellevue King Standard Yes 28.0 No
    8532 3623500205 2014-05-13 2450000 4 4.50 5030 11023 2.0 0 2 ... 11490 1 16053 May-2014 Mercer Island King Standard Yes 31.0 No
    14866 5152700060 2014-05-28 465000 6 3.25 4250 23326 1.0 0 3 ... 15983 1 27576 May-2014 Federal Way King Standard Yes 15.0 No
    3344 4058800215 2014-05-28 430000 3 3.75 3890 7140 1.0 0 2 ... 7320 0 11030 May-2014 Seattle King Standard Yes 35.0 Yes
    3501 4122900190 2014-05-12 1350000 5 1.75 3380 20021 1.0 0 0 ... 19809 0 23401 May-2014 Bellevue King Standard Yes 14.0 No
    15783 217500140 2014-05-13 464000 5 2.50 3400 8970 1.0 0 0 ... 8475 0 12370 May-2014 Seattle King Standard Yes 27.0 No
    9331 5152100060 2014-05-29 472000 6 2.50 4410 14034 1.0 0 2 ... 13988 1 18444 May-2014 Federal Way King Standard Yes 24.0 No
    9349 3342700405 2014-05-22 585000 4 1.75 3000 42200 1.0 0 3 ... 9821 0 45200 May-2014 Renton King Standard Yes 7.0 No
    9279 4139420590 2014-05-20 1210000 4 3.50 4560 16643 1.0 0 3 ... 15177 1 21203 May-2014 Bellevue King Standard Yes 22.0 No
    520 1313000220 2014-05-13 675000 5 3.00 3410 9600 1.0 0 0 ... 9679 0 13010 May-2014 Redmond King Standard Yes 26.0 No
    769 7856410430 2014-05-30 1390000 6 2.75 5700 20000 1.0 0 4 ... 15700 1 25700 May-2014 Bellevue King Standard Yes 22.0 No
    18293 5425700205 2014-05-20 1800000 4 3.50 4460 16953 1.0 0 0 ... 13370 1 21413 May-2014 Medina King Standard Yes 21.0 Yes
    6088 9558050170 2014-05-13 475000 4 2.50 3740 8700 1.0 0 0 ... 6333 1 12440 May-2014 Renton King Standard Yes 30.0 No
    18107 1180008355 2014-05-07 380000 5 1.75 3000 6000 1.0 0 0 ... 7125 0 9000 May-2014 Seattle King Standard Yes 33.0 No
    17068 8562710550 2014-05-21 950000 5 3.75 5330 6000 2.0 0 2 ... 5797 1 11330 May-2014 Issaquah King Standard Yes 47.0 No
    21501 2021201000 2014-05-23 980000 4 3.00 3680 5854 1.0 0 3 ... 5000 1 9534 May-2014 Seattle King Standard Yes 39.0 No
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    16301 9542000275 2015-04-06 675000 4 2.50 2420 18470 1.0 0 0 ... 13800 0 20890 April-2015 Bellevue King Standard Yes 12.0 No
    7062 3982700250 2015-04-23 799900 4 2.50 3030 7800 2.0 0 0 ... 7435 1 10830 April-2015 Kirkland King Standard Yes 28.0 No
    18757 8085400376 2015-04-21 2320000 4 3.50 5050 9520 2.0 0 0 ... 9248 1 14570 April-2015 Bellevue King Standard Yes 35.0 No
    17329 2655500235 2015-04-10 1610000 4 3.50 3920 19088 1.0 0 1 ... 13749 1 23008 April-2015 Mercer Island King Standard Yes 17.0 No
    5618 3524039202 2015-04-20 1070000 3 2.25 2950 7232 1.0 0 2 ... 7140 0 10182 April-2015 Seattle King Standard Yes 29.0 No
    18042 5460600110 2015-04-23 1050000 6 4.00 5310 12741 2.0 0 2 ... 12632 1 18051 April-2015 Mercer Island King Standard Yes 29.0 No
    17165 1736800520 2015-04-03 662500 3 2.50 3560 9796 1.0 0 0 ... 8925 0 13356 April-2015 Bellevue King Standard Yes 27.0 No
    17090 2141300080 2015-04-24 707000 5 2.50 3050 13212 1.0 0 0 ... 10826 0 16262 April-2015 Bellevue King Standard Yes 19.0 No
    15407 1373800330 2015-04-20 1120000 4 2.50 3690 11191 1.0 0 3 ... 8160 1 14881 April-2015 Seattle King Standard Yes 25.0 No
    14911 4147200040 2015-04-14 1090000 5 2.25 3650 13068 1.0 0 0 ... 13927 1 16718 April-2015 Mercer Island King Standard Yes 22.0 No
    2642 2425059074 2015-04-10 740000 5 3.00 3655 51836 1.0 0 0 ... 8606 0 55491 April-2015 Bellevue King Standard Yes 7.0 No
    2860 9808100150 2015-04-02 3350000 5 3.75 5350 15360 1.0 0 1 ... 15940 1 20710 April-2015 Bellevue King Standard Yes 26.0 No
    3498 9560500105 2015-04-24 957000 4 2.25 2860 11545 1.0 0 0 ... 11396 0 14405 April-2015 Bellevue King Standard Yes 20.0 No
    7793 629860010 2015-04-29 1350000 4 3.50 4640 9827 2.0 0 2 ... 8207 1 14467 April-2015 Issaquah King Standard Yes 32.0 No
    5853 7964410100 2015-05-04 700000 4 3.50 5360 25800 1.0 0 0 ... 21781 1 31160 May-2015 Sammamish King Standard Yes 17.0 No
    5500 4139420190 2015-05-12 2480000 4 5.00 5310 16909 1.0 0 4 ... 15701 1 22219 May-2015 Bellevue King Standard Yes 24.0 No
    12295 1742800430 2015-05-04 463828 5 1.75 3250 13702 1.0 0 2 ... 11328 0 16952 May-2015 Renton King Standard Yes 19.0 No
    5347 9541600490 2015-05-05 931088 4 2.50 3510 17400 1.0 0 0 ... 12120 1 20910 May-2015 Bellevue King Standard Yes 17.0 No
    13793 6065300840 2015-05-01 2850000 4 4.00 5040 17208 1.0 0 0 ... 18647 1 22248 May-2015 Bellevue King Standard Yes 23.0 No
    19013 1822079046 2015-05-04 500000 3 2.00 3040 41072 1.0 0 0 ... 54014 0 44112 May-2015 Maple Valley King Standard Yes 7.0 No
    5617 1925069082 2015-05-11 2200000 5 4.25 4640 22703 2.0 1 4 ... 14200 0 27343 May-2015 Redmond King Standard Yes 17.0 No
    7035 1180007375 2015-05-12 625000 5 3.50 4010 6000 2.0 0 3 ... 6000 1 10010 May-2015 Seattle King Standard Yes 40.0 No
    4032 7878400022 2015-05-06 390000 4 2.25 3060 7920 1.0 0 0 ... 7800 0 10980 May-2015 Seattle King Standard Yes 28.0 No
    1890 3336000050 2015-05-01 435000 6 3.00 3560 4290 1.0 0 0 ... 6000 0 7850 May-2015 Seattle King Standard Yes 45.0 No
    19313 8835401250 2015-05-06 1490000 6 2.75 4430 6440 2.0 0 3 ... 7314 1 10870 May-2015 Seattle King Standard Yes 41.0 Yes
    4404 3523069008 2015-05-05 890000 4 3.25 4360 210254 1.0 0 0 ... 87120 1 214614 May-2015 Maple Valley King Standard Yes 2.0 No
    20299 3286800260 2015-05-06 780000 5 2.50 3480 74052 1.0 0 0 ... 65775 0 77532 May-2015 Issaquah King Standard Yes 4.0 No
    4712 1924059254 2015-05-08 1300000 5 3.75 3490 15246 1.0 0 1 ... 15682 1 18736 May-2015 Mercer Island King Standard Yes 19.0 No
    8288 2524049108 2015-05-12 1380000 5 4.25 4050 18827 1.0 0 2 ... 25120 1 22877 May-2015 Mercer Island King Standard Yes 18.0 No
    15391 2925059260 2015-05-06 800000 5 2.50 3000 10560 1.0 0 0 ... 11616 0 13560 May-2015 Bellevue King Standard Yes 22.0 No

    408 rows × 30 columns

    We got 408 records as outliers, let's drop these outliers

    In [100]:
    #dropping the record from the dataset
    house_df.drop(house_df[ (house_df.basement > upperbound_base) | (house_df.basement < lowerbound_base) ].index, inplace=True)
    
    In [101]:
    house_df.shape
    
    Out[101]:
    (20594, 30)
    In [102]:
    #basement_measure
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.distplot(house_df.basement)
    
    Out[102]:
    <matplotlib.axes._subplots.AxesSubplot at 0x22593a3e5f8>

    After treating outliers of basement, we can see that 400(~2%) data points got imputed. Total about 5% data has been imputed after treating ceil_measure and basement.

    In [103]:
    #Let's see the boxplot now for basement
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.boxplot(house_df['basement'])
    
    Out[103]:
    <matplotlib.axes._subplots.AxesSubplot at 0x22593921d30>

    Treating outliers for column - living_measure

    In [104]:
    lowerbound_lim,upperbound_lim = outlier_treatment(house_df.living_measure)
    print(lowerbound_lim,upperbound_lim)
    
    -160.0 4000.0
    
    In [105]:
    house_df[(house_df.living_measure < lowerbound_lim) | (house_df.living_measure > upperbound_lim)]
    
    Out[105]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_year City County Type has_basement HouseLandRatio has_renovated
    10110 6669100070 2014-05-12 900000 4 3.25 4700 38412 2.0 0 0 ... 35571 1 43112 May-2014 Bellevue King Standard Yes 11.0 No
    7275 2926069083 2014-05-07 900000 5 3.75 4130 226076 2.0 0 0 ... 55321 1 230206 May-2014 Woodinville King Standard Yes 2.0 No
    10549 6819100020 2014-05-29 1430000 4 4.25 4960 6000 2.5 0 0 ... 4080 1 10960 May-2014 Seattle King Standard Yes 45.0 Yes
    1438 5093300325 2014-05-23 1610000 4 3.50 4390 11600 2.0 0 3 ... 12000 1 15990 May-2014 Mercer Island King Standard Yes 27.0 No
    13897 7853280350 2014-05-12 809000 5 4.50 4630 6324 2.0 0 0 ... 6790 1 10954 May-2014 Snoqualmie King Standard Yes 42.0 No
    10530 6169901185 2014-05-20 490000 5 3.50 4460 2975 3.0 0 2 ... 4231 1 7435 May-2014 Seattle King Standard Yes 60.0 No
    6830 425079099 2014-05-07 560000 3 3.00 4120 60392 2.0 0 2 ... 64033 1 64512 May-2014 Carnation King Standard Yes 6.0 No
    2969 7853280550 2014-05-28 700000 4 3.50 4490 5099 2.0 0 0 ... 5537 1 9589 May-2014 Snoqualmie King Standard Yes 47.0 No
    2864 251620090 2014-05-30 2400000 4 3.25 4140 20734 1.0 0 1 ... 20008 1 24874 May-2014 Bellevue King Standard Yes 17.0 Yes
    2680 587550280 2014-05-30 625000 4 3.25 4240 25639 2.0 0 3 ... 24967 1 29879 May-2014 Federal Way King Standard Yes 14.0 No
    5059 1338600225 2014-05-28 1970000 8 3.50 4440 6480 2.0 0 3 ... 8640 1 10920 May-2014 Seattle King Standard Yes 41.0 No
    11490 526069024 2014-05-12 950000 5 3.00 4530 258746 1.5 0 0 ... 83199 1 263276 May-2014 Woodinville King Standard Yes 2.0 No
    17520 723000114 2014-05-05 1400000 5 3.50 4010 8510 2.0 0 1 ... 6128 1 12520 May-2014 Seattle King Standard Yes 32.0 No
    4965 8562710250 2014-05-05 890000 4 4.25 4420 5750 2.0 0 0 ... 5750 1 10170 May-2014 Issaquah King Standard Yes 43.0 No
    4596 8562710520 2014-05-05 890000 5 3.50 4490 6000 2.0 0 0 ... 6000 1 10490 May-2014 Issaquah King Standard Yes 43.0 No
    21557 3758900075 2014-05-07 1530000 5 4.50 4270 8076 2.0 0 0 ... 10631 1 12346 May-2014 Kirkland King Standard Yes 35.0 No
    1071 1924069039 2014-05-19 869000 5 3.25 4180 49222 2.0 0 0 ... 8029 0 53402 May-2014 Issaquah King Standard Yes 8.0 No
    15797 3127200021 2014-06-16 850000 4 3.50 4140 7089 2.0 0 0 ... 8896 1 11229 June-2014 Kirkland King Standard Yes 37.0 No
    5235 293760050 2014-06-27 1050000 4 4.25 4390 13833 2.0 0 3 ... 11652 1 18223 June-2014 Issaquah King Standard Yes 24.0 No
    19273 3629890190 2014-06-06 1300000 4 4.00 4270 6002 2.0 0 3 ... 5942 1 10272 June-2014 Issaquah King Standard Yes 42.0 No
    17631 1702901180 2014-06-11 665000 6 3.00 4250 4400 2.5 0 0 ... 4950 0 8650 June-2014 Seattle King Standard Yes 49.0 No
    4215 8043700300 2014-06-08 2700000 4 3.25 4420 7850 2.0 1 4 ... 8525 1 12270 June-2014 Bellevue King Standard Yes 36.0 No
    7571 3616600231 2014-06-03 960000 4 3.00 4590 9150 2.0 0 0 ... 12348 1 13740 June-2014 Seattle King Standard Yes 33.0 No
    17402 8128600060 2014-06-24 600000 4 3.25 4690 14930 2.0 0 2 ... 13320 1 19620 June-2014 Seattle King Standard Yes 24.0 No
    19073 5561300730 2014-06-05 530000 4 3.25 4160 35654 2.0 0 0 ... 35675 0 39814 June-2014 Issaquah King Standard Yes 10.0 No
    7431 5078400160 2014-06-05 1800000 5 4.50 4400 15580 2.0 0 0 ... 14249 1 19980 June-2014 Bellevue King Standard Yes 22.0 No
    21125 5700003630 2014-06-30 1930000 5 4.25 4830 8050 2.5 0 2 ... 9194 1 12880 June-2014 Seattle King Standard Yes 38.0 No
    1552 1336800010 2014-06-13 1340000 5 2.25 4200 5800 2.5 0 0 ... 5800 1 10000 June-2014 Seattle King Standard Yes 42.0 No
    14441 7853280570 2014-06-04 765000 4 3.00 4410 5104 2.0 0 0 ... 5537 1 9514 June-2014 Snoqualmie King Standard Yes 46.0 No
    372 7636800041 2014-06-25 995000 3 4.50 4380 47044 2.0 1 3 ... 18512 1 51424 June-2014 Seattle King Standard Yes 9.0 Yes
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    11314 722059020 2015-03-18 550000 6 4.50 4520 40164 2.0 0 0 ... 13068 1 44684 March-2015 Kent King Standard Yes 10.0 Yes
    11089 745530180 2015-03-17 870000 5 3.50 4495 10079 2.0 0 0 ... 10079 1 14574 March-2015 Bothell King Standard Yes 31.0 No
    11340 9362000080 2015-03-16 1600000 5 3.50 4050 20925 2.0 0 3 ... 18321 1 24975 March-2015 Mercer Island King Standard Yes 16.0 Yes
    13844 1824079073 2015-03-31 985000 5 4.25 4650 108464 2.0 0 0 ... 155509 1 113114 March-2015 Fall City King Standard Yes 4.0 No
    9911 3026059085 2015-03-17 1290000 5 3.50 4090 290980 1.0 0 0 ... 9255 1 295070 March-2015 Kirkland King Standard Yes 1.0 No
    17302 1924059319 2015-03-20 1290000 5 4.00 4050 11358 2.0 0 0 ... 13555 1 15408 March-2015 Mercer Island King Standard Yes 26.0 No
    14676 1333300145 2015-03-04 2230000 3 4.00 4200 30120 2.0 0 2 ... 12200 1 34320 March-2015 Seattle King Standard Yes 12.0 No
    11625 2600010220 2015-03-26 1250000 4 2.50 4040 11350 2.0 0 2 ... 12382 1 15390 March-2015 Bellevue King Standard Yes 26.0 No
    8631 3616600003 2015-03-02 1680000 3 2.50 4090 16972 2.0 0 2 ... 16972 1 21062 March-2015 Seattle King Standard Yes 19.0 No
    14545 2579500101 2015-04-21 1390000 4 3.50 4010 10880 2.0 0 3 ... 17310 1 14890 April-2015 Mercer Island King Standard Yes 27.0 No
    567 9185700485 2015-04-01 2540000 4 3.50 4350 6000 2.0 0 0 ... 7200 1 10350 April-2015 Seattle King Standard Yes 42.0 No
    21062 3303980140 2015-04-02 1150000 4 3.00 4160 13170 2.0 0 0 ... 13148 1 17330 April-2015 Renton King Standard Yes 24.0 No
    12545 269000970 2015-04-02 1300000 5 3.75 4450 7680 2.0 0 0 ... 6400 1 12130 April-2015 Seattle King Standard Yes 37.0 No
    1512 1118000340 2015-04-08 3000000 5 3.75 4590 11265 2.0 0 0 ... 8996 1 15855 April-2015 Seattle King Standard Yes 29.0 No
    4292 1115300270 2015-04-28 900000 6 3.75 4210 6105 2.0 0 0 ... 6368 1 10315 April-2015 Renton King Standard Yes 41.0 No
    16585 6645950070 2015-04-01 1450000 4 3.50 5000 38012 2.0 0 0 ... 18054 1 43012 April-2015 Issaquah King Standard Yes 12.0 No
    16752 8562720420 2015-04-30 1350000 4 3.50 4740 8611 2.0 0 3 ... 8321 1 13351 April-2015 Issaquah King Standard Yes 36.0 No
    16743 1223089077 2015-04-01 718000 3 1.75 4060 136290 1.0 0 0 ... 51836 0 140350 April-2015 North Bend King Standard Yes 3.0 No
    7550 2260300060 2015-04-10 2580000 5 3.00 4780 20440 1.0 0 0 ... 20440 1 25220 April-2015 Medina King Standard Yes 19.0 No
    5755 1069000070 2015-04-15 2800000 5 3.25 4590 12793 2.0 0 2 ... 8609 1 17383 April-2015 Seattle King Standard Yes 26.0 No
    5706 4128500380 2015-04-27 1200000 4 2.50 4280 12796 2.0 0 0 ... 9593 1 17076 April-2015 Bellevue King Standard Yes 25.0 No
    5670 2254100090 2015-04-07 887250 5 3.50 4320 7502 2.0 0 0 ... 7538 1 11822 April-2015 Renton King Standard Yes 37.0 No
    5572 853200040 2015-04-28 2410000 5 2.50 4600 23250 1.5 0 2 ... 20066 1 27850 April-2015 Bellevue King Standard Yes 17.0 Yes
    7209 8562750060 2015-04-20 825000 5 3.50 4140 6770 2.0 0 0 ... 5431 1 10910 April-2015 Issaquah King Standard Yes 38.0 No
    19608 114101505 2015-04-23 630000 5 3.50 4060 8309 2.0 0 0 ... 11711 1 12369 April-2015 Kenmore King Standard Yes 33.0 No
    7997 5700004028 2015-04-17 2450000 4 4.25 4250 6552 2.0 0 3 ... 8841 1 10802 April-2015 Seattle King Standard Yes 39.0 No
    17159 1118000320 2015-05-08 3400000 4 4.00 4260 11765 2.0 0 0 ... 10408 1 16025 May-2015 Seattle King Standard Yes 27.0 Yes
    17742 5428000070 2015-05-11 770000 5 3.50 4750 8234 2.0 0 2 ... 14496 1 12984 May-2015 Seattle King Standard Yes 37.0 No
    16333 2421059090 2015-05-11 640000 4 2.50 4090 215186 2.0 0 0 ... 142005 0 219276 May-2015 Auburn King Standard Yes 2.0 No
    1152 1525069088 2015-05-04 442500 5 3.25 4240 226097 2.0 0 0 ... 217800 0 230337 May-2015 Redmond King Standard Yes 2.0 No

    178 rows × 30 columns

    We got 178 records as outliers. Let's treat this by dropping

    In [106]:
    #dropping the record from the dataset
    house_df.drop(house_df[ (house_df.living_measure > upperbound_lim) | (house_df.living_measure < lowerbound_lim) ].index, inplace=True)
    
    In [107]:
    #let's see the boxplot after dropping the outliers
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.boxplot(house_df['living_measure'])
    
    Out[107]:
    <matplotlib.axes._subplots.AxesSubplot at 0x22593a3e240>
    In [108]:
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.distplot(house_df.living_measure)
    
    Out[108]:
    <matplotlib.axes._subplots.AxesSubplot at 0x22595886198>

    By treating outliers of living_measure, we lost 178 data points more and data distribution looks normal

    In [109]:
    # shape of the data after imputing outliers in living_column
    house_df.shape
    
    Out[109]:
    (20416, 30)

    Treating outliers for column - lot_measure

    In [110]:
    lowerbound_lom,upperbound_lom = outlier_treatment(house_df.lot_measure)
    print(lowerbound_lom,upperbound_lom)
    
    -2774.875 17958.125
    
    In [111]:
    house_df[(house_df.lot_measure < lowerbound_lom) | (house_df.lot_measure > upperbound_lom)]
    
    Out[111]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_year City County Type has_basement HouseLandRatio has_renovated
    10082 1121039059 2014-05-22 503000 2 1.75 2860 59612 1.0 1 4 ... 59612 0 62472 May-2014 Federal Way King Standard Yes 5.0 Yes
    14089 6070500055 2014-05-06 599000 4 2.25 2260 29930 2.0 0 0 ... 29930 0 32190 May-2014 Bellevue King Standard Yes 7.0 No
    1611 5561000190 2014-05-02 437500 3 2.25 1970 35100 2.0 0 0 ... 35100 1 37070 May-2014 Issaquah King Standard No 5.0 No
    14068 5111400086 2014-05-12 110000 3 1.00 1250 53143 1.0 0 0 ... 217800 0 54393 May-2014 Maple Valley King Standard No 2.0 No
    14081 3022039071 2014-05-30 800000 2 2.25 1730 31491 2.0 1 2 ... 12410 0 33221 May-2014 Vashon King Standard No 5.0 Yes
    20351 9808610190 2014-05-09 782000 4 2.50 2830 20345 2.0 0 0 ... 13732 1 23175 May-2014 Bellevue King Standard Yes 12.0 No
    9981 2324800350 2014-05-06 860000 4 2.00 3740 32417 2.0 0 0 ... 32417 1 36157 May-2014 Redmond King Standard No 10.0 No
    16273 1823069279 2014-05-20 499950 5 3.50 3200 43560 2.0 0 0 ... 43560 0 46760 May-2014 Renton King Standard No 7.0 No
    16325 7214700160 2014-05-09 610000 3 3.00 2480 45302 1.0 0 0 ... 14100 0 47782 May-2014 Woodinville King Standard Yes 5.0 No
    10030 2025700730 2014-05-02 287200 3 3.00 1850 19966 1.0 0 0 ... 6715 0 21816 May-2014 Maple Valley King Standard Yes 8.0 No
    3870 1330900250 2014-05-15 550000 3 2.25 1980 40887 1.0 0 0 ... 35700 0 42867 May-2014 Redmond King Standard No 5.0 No
    16422 4047200380 2014-05-26 460000 2 1.50 2730 19877 1.0 0 0 ... 19509 0 22607 May-2014 Duvall King Standard Yes 12.0 No
    3865 2924069132 2014-05-27 527500 3 1.75 2310 78844 1.0 0 0 ... 6230 0 81154 May-2014 Issaquah King Standard Yes 3.0 No
    1589 4045500510 2014-05-21 420850 1 1.00 960 40946 1.0 0 0 ... 20350 0 41906 May-2014 Carnation King Standard No 2.0 No
    3824 320069049 2014-05-14 305000 4 1.50 1590 131551 1.0 0 3 ... 108028 0 133141 May-2014 Enumclaw King Standard No 1.0 No
    18773 1321720140 2014-05-28 370000 4 2.50 3090 18645 2.0 0 0 ... 20114 1 21735 May-2014 Federal Way King Standard No 14.0 No
    14357 3210950080 2014-05-14 486000 4 2.50 2150 39449 1.0 0 0 ... 35717 0 41599 May-2014 Fall City King Standard Yes 5.0 No
    14310 1921069082 2014-05-12 560000 3 2.00 2560 216777 1.0 0 0 ... 108463 0 219337 May-2014 Auburn King Standard No 1.0 No
    16144 7574910780 2014-05-14 766950 3 2.50 3030 30007 1.5 0 0 ... 34983 1 33037 May-2014 Woodinville King Standard No 9.0 No
    9659 1023059365 2014-05-06 520000 3 2.50 2460 54885 2.0 0 0 ... 21407 0 57345 May-2014 Renton King Standard No 4.0 No
    7446 4012800010 2014-05-06 360000 4 2.00 2680 18768 1.0 0 0 ... 15750 0 21448 May-2014 Auburn King Standard No 12.0 No
    18970 3523089019 2014-05-19 480000 4 3.50 3370 435600 2.0 0 3 ... 114868 1 438970 May-2014 North Bend King Standard No 1.0 No
    7423 4188000670 2014-05-15 749400 4 2.50 3240 20301 2.0 0 0 ... 23650 1 23541 May-2014 Redmond King Standard No 14.0 No
    16149 9368700031 2014-05-09 195000 2 1.00 720 18000 1.0 0 0 ... 7925 0 18720 May-2014 Seattle King Standard No 4.0 No
    9913 8856000545 2014-05-07 100000 2 1.00 910 22000 1.0 0 0 ... 9891 0 22910 May-2014 Auburn King Standard No 4.0 No
    7214 124069032 2014-05-05 600000 3 1.75 1670 39639 1.0 0 0 ... 30492 0 41309 May-2014 Sammamish King Standard No 4.0 Yes
    7264 2724089019 2014-05-23 527550 1 0.75 820 59677 1.0 0 0 ... 14163 0 60497 May-2014 Snoqualmie King Standard No 1.0 No
    7323 3761700251 2014-05-28 600000 4 2.00 2510 38141 1.0 0 0 ... 11760 1 40651 May-2014 Kirkland King Standard No 6.0 No
    1809 226059103 2014-05-27 570000 3 1.75 1930 36210 1.0 0 0 ... 35060 0 38140 May-2014 Woodinville King Standard No 5.0 No
    13684 1721069036 2014-05-29 412000 3 1.75 1950 52256 1.0 0 0 ... 51836 0 54206 May-2014 Kent King Standard No 4.0 No
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    19300 3924500130 2015-05-06 460000 2 2.50 1880 40575 1.0 0 0 ... 32935 1 42455 May-2015 Fall City King Standard No 4.0 No
    3221 1775700011 2015-05-12 390000 3 2.50 1410 26375 1.0 0 0 ... 12474 0 27785 May-2015 Woodinville King Standard No 5.0 No
    20818 126039394 2015-05-08 525000 4 2.75 2300 26650 1.0 0 0 ... 9879 0 28950 May-2015 Seattle King Standard No 8.0 No
    3195 9310300215 2015-05-06 652500 4 1.75 3130 18253 2.0 0 0 ... 12220 0 21383 May-2015 Seattle King Standard No 15.0 No
    13297 322069010 2015-05-08 435000 3 2.00 2570 233481 1.5 0 0 ... 157687 0 236051 May-2015 Maple Valley King Standard No 1.0 No
    11080 1823069088 2015-05-04 492000 2 1.75 1300 22239 1.0 0 0 ... 14810 0 23539 May-2015 Renton King Standard No 6.0 Yes
    11081 625069064 2015-05-07 625000 3 2.25 2570 47480 1.0 0 0 ... 106722 1 50050 May-2015 Redmond King Standard No 5.0 No
    9892 2124069103 2015-05-05 374000 3 1.75 1510 18439 1.0 0 0 ... 34326 0 19949 May-2015 Issaquah King Standard No 8.0 No
    4291 2426049079 2015-05-06 330000 3 1.00 1060 20040 1.0 0 0 ... 10800 0 21100 May-2015 Kirkland King Standard No 5.0 No
    9880 8011100050 2015-05-08 350000 2 1.00 1220 28703 1.0 0 0 ... 6720 0 29923 May-2015 Renton King Standard No 4.0 No
    4246 2722059275 2015-05-12 536000 3 2.75 2290 34548 2.0 0 3 ... 275299 0 36838 May-2015 Kent King Standard No 6.0 No
    13312 8835800450 2015-05-04 950000 3 2.50 2780 275033 1.0 0 0 ... 16340 1 277813 May-2015 North Bend King Standard No 1.0 No
    20752 1326069050 2015-05-04 750000 2 2.00 2370 155130 1.0 0 0 ... 14475 0 157500 May-2015 Duvall King Standard No 2.0 No
    2011 2591720160 2015-05-01 674950 3 2.75 3510 92347 2.0 0 0 ... 37070 1 95857 May-2015 Maple Valley King Standard No 4.0 No
    20375 302000375 2015-05-06 250000 3 2.00 1050 18304 1.0 0 0 ... 15675 0 19354 May-2015 Auburn King Standard No 5.0 No
    15300 722039087 2015-05-04 329000 2 1.00 990 57499 1.0 0 0 ... 27442 0 58489 May-2015 Vashon King Standard No 2.0 No
    19102 1774220070 2015-05-07 550000 4 2.25 2590 36256 2.0 0 0 ... 35657 0 38846 May-2015 Woodinville King Standard No 7.0 No
    9615 2316400285 2015-05-13 495000 4 3.50 2490 18042 2.0 0 0 ... 21107 0 20532 May-2015 Vashon King Standard No 12.0 No
    14043 9406510130 2015-05-05 448000 5 3.50 3740 24684 2.0 0 0 ... 26023 1 28424 May-2015 Maple Valley King Standard Yes 13.0 No
    10637 522079068 2015-05-06 513000 3 2.50 2150 161607 2.0 0 0 ... 207781 0 163757 May-2015 Maple Valley King Standard Yes 1.0 No
    6109 251610020 2015-05-08 1580000 4 2.75 3480 19991 2.0 0 2 ... 20271 1 23471 May-2015 Bellevue King Standard Yes 15.0 No
    12897 4027701265 2015-05-01 480000 3 1.75 2920 21375 1.0 0 0 ... 8482 0 24295 May-2015 Kenmore King Standard Yes 12.0 No
    1591 4166600610 2015-05-14 335000 3 2.00 1410 44866 1.0 0 0 ... 29152 0 46276 May-2015 Federal Way King Standard No 3.0 No
    6116 122029066 2015-05-08 490000 3 1.75 2020 215622 2.0 0 0 ... 215622 0 217642 May-2015 Vashon King Standard No 1.0 No
    7911 3585900460 2015-05-01 1060000 6 2.75 2980 20000 1.0 0 4 ... 20000 0 22980 May-2015 Seattle King Standard Yes 13.0 No
    11329 2320069111 2015-05-07 449999 4 1.75 2290 36900 1.5 0 2 ... 12434 0 39190 May-2015 Enumclaw King Standard Yes 6.0 No
    9752 2521059060 2015-05-01 490000 3 2.25 2840 107157 2.0 0 0 ... 215622 1 109997 May-2015 Auburn King Standard No 3.0 No
    4012 6446200050 2015-05-04 540000 3 1.75 2590 25992 1.0 0 0 ... 29250 0 28582 May-2015 Issaquah King Standard Yes 9.0 No
    16888 3422059208 2015-05-11 390000 3 2.50 1930 64904 1.0 0 0 ... 57500 0 66834 May-2015 Kent King Standard No 3.0 No
    4579 1921069101 2015-05-08 399000 3 1.75 2170 73616 1.0 0 0 ... 297514 0 75786 May-2015 Auburn King Standard No 3.0 No

    2128 rows × 30 columns

    We got 2155 records which are outliers. Let's drop these outlier records.

    In [112]:
    #dropping the record from the dataset
    house_df.drop(house_df[ (house_df.lot_measure > upperbound_lom) | (house_df.lot_measure < lowerbound_lom) ].index, inplace=True)
    
    In [113]:
    #let's plot after treating outliers
    plt.figure(figsize=(plotSizeX, plotSizeY))
    sns.boxplot(house_df['lot_measure'])
    
    Out[113]:
    <matplotlib.axes._subplots.AxesSubplot at 0x22593975eb8>
    In [114]:
    house_df.shape
    
    Out[114]:
    (18288, 30)

    Total outliers in the lot_measure are 2128 data points. But still we are going ahead with imputing the data. We will analyze later whether there is any impact on the data set or not.

    Treating outliers for column - room_bed

    In [115]:
    #As we know for room_bed = 33 was outlier from our earlier findings, let's see the record and drop it
    house_df[house_df['room_bed']==33]
    
    Out[115]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_year City County Type has_basement HouseLandRatio has_renovated
    750 2402100895 2014-06-25 640000 33 1.75 1620 6000 1.0 0 0 ... 4700 0 7620 June-2014 Seattle King Standard Yes 21.0 No

    1 rows × 30 columns

    In [116]:
    #dropping the record from the dataset
    house_df.drop(house_df[ (house_df.room_bed == 33) ].index, inplace=True)
    
    In [117]:
    house_df.shape
    
    Out[117]:
    (18287, 30)
    In summary, after treating outliers, we have lost about 15% of the data. We will analyse the impact of this data loss during the model evaluation.
    In [118]:
    #let's see the feature/columns and drop the unneccessary features
    house_df.columns
    
    Out[118]:
    Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',
           'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality',
           'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode',
           'lat', 'long', 'living_measure15', 'lot_measure15', 'furnished',
           'total_area', 'month_year', 'City', 'County', 'Type', 'has_basement',
           'HouseLandRatio', 'has_renovated'],
          dtype='object')

    As we already have this information in other features. We will drop the unwanted columns from new copied dataframe instance : cid,dayhours,yr_renovated,zipcode,lat,long,county,type

    In [119]:
    #Let's create another dataframe for modeling
    df_model=house_df.copy()
    
    In [120]:
    #let's check the new copy of dataframe by printing first few records
    df_model.head()
    
    Out[120]:
    cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_year City County Type has_basement HouseLandRatio has_renovated
    17786 7568700740 2014-05-21 430000 3 2.75 2550 11160 2.0 0 0 ... 7440 0 13710 May-2014 Seattle King Standard No 19.0 No
    3782 2248000080 2014-05-21 385500 3 2.00 1540 7947 1.0 0 0 ... 7950 0 9487 May-2014 Bothell King Standard Yes 16.0 No
    10069 7805450110 2014-05-06 736000 4 2.50 2290 12047 2.0 0 0 ... 15666 1 14337 May-2014 Bellevue King Standard No 16.0 No
    7114 2215500080 2014-05-28 580000 5 2.00 1940 6000 1.0 0 0 ... 6000 0 7940 May-2014 Seattle King Standard Yes 24.0 No
    10080 1219000043 2014-05-09 315000 5 1.75 2320 8100 1.0 0 0 ... 7271 0 10420 May-2014 Seattle King Standard Yes 22.0 No

    5 rows × 30 columns

    New instance of dataframe for model created successfully

    In [121]:
    #let's verify the columns
    df_model.columns
    
    Out[121]:
    Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',
           'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality',
           'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode',
           'lat', 'long', 'living_measure15', 'lot_measure15', 'furnished',
           'total_area', 'month_year', 'City', 'County', 'Type', 'has_basement',
           'HouseLandRatio', 'has_renovated'],
          dtype='object')
    In [122]:
    #Dropping the feature not required in 1st Iteration
    df_final=df_model.drop(['cid','dayhours','yr_renovated','zipcode','lat','long','County','Type'],axis=1)
    
    In [123]:
    df_final.shape
    
    Out[123]:
    (18287, 22)
    In [124]:
    df_final.head()
    
    Out[124]:
    price room_bed room_bath living_measure lot_measure ceil coast sight condition quality ... yr_built living_measure15 lot_measure15 furnished total_area month_year City has_basement HouseLandRatio has_renovated
    17786 430000 3 2.75 2550 11160 2.0 0 0 3 8 ... 1994 1020 7440 0 13710 May-2014 Seattle No 19.0 No
    3782 385500 3 2.00 1540 7947 1.0 0 0 3 7 ... 1961 1910 7950 0 9487 May-2014 Bothell Yes 16.0 No
    10069 736000 4 2.50 2290 12047 2.0 0 0 4 9 ... 1988 3130 15666 1 14337 May-2014 Bellevue No 16.0 No
    7114 580000 5 2.00 1940 6000 1.0 0 0 5 7 ... 1945 1700 6000 0 7940 May-2014 Seattle Yes 24.0 No
    10080 315000 5 1.75 2320 8100 1.0 0 0 4 7 ... 1956 1410 7271 0 10420 May-2014 Seattle Yes 22.0 No

    5 rows × 22 columns

    In [125]:
    df_final.columns
    
    Out[125]:
    Index(['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure',
           'ceil', 'coast', 'sight', 'condition', 'quality', 'ceil_measure',
           'basement', 'yr_built', 'living_measure15', 'lot_measure15',
           'furnished', 'total_area', 'month_year', 'City', 'has_basement',
           'HouseLandRatio', 'has_renovated'],
          dtype='object')
    Creating dummies for categorical variables: 'room_bed', 'room_bath', 'ceil', 'coast', 'sight', 'condition', 'quality', 'furnished','City', 'has_basement', 'has_renovated'
    In [126]:
    # Getting dummies for columns ceil, coast, sight, condition, quality, yr_renovated, furnished
    dff = pd.get_dummies(df_final, columns=['room_bed', 'room_bath', 'ceil', 'coast', 'sight', 'condition', 'quality', 'furnished','City', 
                                            'has_basement', 'has_renovated'],drop_first=True)
    
    In [127]:
    # let's see the data types of the features
    dff.shape
    
    Out[127]:
    (18287, 92)
    In [128]:
    dff.columns
    
    Out[128]:
    Index(['price', 'living_measure', 'lot_measure', 'ceil_measure', 'basement',
           'yr_built', 'living_measure15', 'lot_measure15', 'total_area',
           'month_year', 'HouseLandRatio', 'room_bed_1', 'room_bed_2',
           'room_bed_3', 'room_bed_4', 'room_bed_5', 'room_bed_6', 'room_bed_7',
           'room_bed_8', 'room_bed_9', 'room_bed_10', 'room_bed_11',
           'room_bath_0.5', 'room_bath_0.75', 'room_bath_1.0', 'room_bath_1.25',
           'room_bath_1.5', 'room_bath_1.75', 'room_bath_2.0', 'room_bath_2.25',
           'room_bath_2.5', 'room_bath_2.75', 'room_bath_3.0', 'room_bath_3.25',
           'room_bath_3.5', 'room_bath_3.75', 'room_bath_4.0', 'room_bath_4.25',
           'room_bath_4.5', 'room_bath_4.75', 'room_bath_5.0', 'room_bath_5.25',
           'room_bath_5.75', 'ceil_1.5', 'ceil_2.0', 'ceil_2.5', 'ceil_3.0',
           'ceil_3.5', 'coast_1', 'sight_1', 'sight_2', 'sight_3', 'sight_4',
           'condition_2', 'condition_3', 'condition_4', 'condition_5', 'quality_4',
           'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9',
           'quality_10', 'quality_11', 'quality_12', 'furnished_1',
           'City_Bellevue', 'City_Black Diamond', 'City_Bothell', 'City_Carnation',
           'City_Duvall', 'City_Enumclaw', 'City_Fall City', 'City_Federal Way',
           'City_Issaquah', 'City_Kenmore', 'City_Kent', 'City_Kirkland',
           'City_Maple Valley', 'City_Medina', 'City_Mercer Island',
           'City_North Bend', 'City_Redmond', 'City_Renton', 'City_Sammamish',
           'City_Seattle', 'City_Snoqualmie', 'City_Vashon', 'City_Woodinville',
           'has_basement_Yes', 'has_renovated_Yes'],
          dtype='object')

    Ready for model building

    'dff' is the data frame which is ready for modeling

    In [129]:
    dff.head()
    
    Out[129]:
    price living_measure lot_measure ceil_measure basement yr_built living_measure15 lot_measure15 total_area month_year ... City_North Bend City_Redmond City_Renton City_Sammamish City_Seattle City_Snoqualmie City_Vashon City_Woodinville has_basement_Yes has_renovated_Yes
    17786 430000 2550 11160 2550 0 1994 1020 7440 13710 May-2014 ... 0 0 0 0 1 0 0 0 0 0
    3782 385500 1540 7947 1120 420 1961 1910 7950 9487 May-2014 ... 0 0 0 0 0 0 0 0 1 0
    10069 736000 2290 12047 2290 0 1988 3130 15666 14337 May-2014 ... 0 0 0 0 0 0 0 0 0 0
    7114 580000 1940 6000 970 970 1945 1700 6000 7940 May-2014 ... 0 0 0 0 1 0 0 0 1 0
    10080 315000 2320 8100 1160 1160 1956 1410 7271 10420 May-2014 ... 0 0 0 0 1 0 0 0 1 0

    5 rows × 92 columns

    In [130]:
    #let's drop the month_year column as we already analyzed it
    dff=dff.drop(['month_year'],axis=1)
    
    In [131]:
    #Creating X, y for training and testing set
    X = dff.drop("price" , axis=1)
    y = dff["price"]
    
    In [132]:
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)
    X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=10)
    
    In [133]:
    print(X_train.shape)
    print(X_test.shape)
    print(X_val.shape)
    
    (11703, 90)
    (3658, 90)
    (2926, 90)
    
    In [134]:
    dff.head()
    
    Out[134]:
    price living_measure lot_measure ceil_measure basement yr_built living_measure15 lot_measure15 total_area HouseLandRatio ... City_North Bend City_Redmond City_Renton City_Sammamish City_Seattle City_Snoqualmie City_Vashon City_Woodinville has_basement_Yes has_renovated_Yes
    17786 430000 2550 11160 2550 0 1994 1020 7440 13710 19.0 ... 0 0 0 0 1 0 0 0 0 0
    3782 385500 1540 7947 1120 420 1961 1910 7950 9487 16.0 ... 0 0 0 0 0 0 0 0 1 0
    10069 736000 2290 12047 2290 0 1988 3130 15666 14337 16.0 ... 0 0 0 0 0 0 0 0 0 0
    7114 580000 1940 6000 970 970 1945 1700 6000 7940 24.0 ... 0 0 0 0 1 0 0 0 1 0
    10080 315000 2320 8100 1160 1160 1956 1410 7271 10420 22.0 ... 0 0 0 0 1 0 0 0 1 0

    5 rows × 91 columns

    Model building

    Let's build the model and see their performances

    Linear Regression (with Ridge and Lasso)

    In [135]:
    #importing the necessary libraries
    from sklearn.linear_model import LinearRegression
    from sklearn.linear_model import Ridge
    from sklearn.linear_model import Lasso
    
    from sklearn import metrics
    from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
    
    In [136]:
    LR1 = LinearRegression()
    LR1.fit(X_train, y_train)
    #predicting result over test data
    y_LR1_predtr= LR1.predict(X_train)
    y_LR1_predvl= LR1.predict(X_val)
    
    LR1.coef_
    
    Out[136]:
    array([ 4.65340600e+01, -2.60438290e+01,  4.15085952e+01,  5.02547332e+00,
           -2.04412513e+03,  5.35189660e+01, -1.85866571e+00,  2.04902213e+01,
            1.71493571e+02, -1.52366392e+04, -1.71559917e+02, -7.41190980e+03,
           -2.49409439e+04, -3.38428650e+04, -7.05629386e+04, -1.39570745e+05,
           -6.13484472e+04, -5.01063903e+04, -1.54205825e+05, -2.24656830e+05,
            1.25982909e+04,  8.20853745e+04,  9.07899656e+04,  2.27686059e+05,
            9.03633165e+04,  1.00072705e+05,  1.08362140e+05,  1.12686599e+05,
            1.12536191e+05,  1.13322504e+05,  1.30547366e+05,  1.77676262e+05,
            1.64433901e+05,  2.85527951e+05,  1.71382012e+05,  1.61500051e+05,
            1.74737226e+05,  9.19752797e+05,  1.55294652e+05,  2.95336027e+05,
           -7.18864612e-09,  1.64060215e+04,  1.53741629e+04,  5.55177883e+04,
            5.71678128e+04,  7.56908538e+04,  2.60659481e+05,  4.01215839e+04,
            4.57278795e+04,  1.17418144e+05,  2.55845105e+05,  9.85063804e+04,
            1.30265390e+05,  1.58016139e+05,  1.95914534e+05, -1.52225301e+05,
           -1.50728574e+05, -1.30295440e+05, -5.38413267e+04,  1.41195146e+04,
           -3.31240159e+05, -2.05150668e+05,  3.46837714e+04,  9.74678182e+05,
            4.72971126e+05,  2.94909682e+05,  1.32458120e+05,  1.23710820e+05,
            1.76625819e+05,  1.09165616e+05,  1.81816493e+04,  1.72163167e+05,
           -1.34952467e+04,  1.66521627e+05,  1.21190592e+05,  1.50478273e+04,
            2.33952638e+05,  4.07406448e+04,  8.02211265e+05,  4.19684384e+05,
            1.32976614e+05,  2.32526660e+05,  6.01053338e+04,  1.61179525e+05,
            1.73301885e+05,  1.04083333e+05,  8.65668543e+04,  1.56470485e+05,
            2.84414383e+04,  3.11287808e+04])
    In [137]:
    #Model score and Deduction for each Model in a DataFrame
    LR1_trscore=r2_score(y_train,y_LR1_predtr)
    LR1_trRMSE=np.sqrt(mean_squared_error(y_train, y_LR1_predtr))
    LR1_trMSE=mean_squared_error(y_train, y_LR1_predtr)
    LR1_trMAE=mean_absolute_error(y_train, y_LR1_predtr)
    
    LR1_vlscore=r2_score(y_val,y_LR1_predvl)
    LR1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_LR1_predvl))
    LR1_vlMSE=mean_squared_error(y_val, y_LR1_predvl)
    LR1_vlMAE=mean_absolute_error(y_val, y_LR1_predvl)
    
    Compa_df=pd.DataFrame({'Method':['Linear Reg Model1'],'Val Score':LR1_vlscore,'RMSE_vl': LR1_vlRMSE, 'MSE_vl': LR1_vlMSE, 'MAE_vl': LR1_vlMAE,'train Score':LR1_trscore,'RMSE_tr': LR1_trRMSE, 'MSE_tr': LR1_trMSE, 'MAE_tr': LR1_trMAE})
    
    #Compa_df = Compa_df[['Method', 'Test Score', 'RMSE', 'MSE', 'MAE']]
    
    Compa_df
    
    Out[137]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

    The linear regression model performed with scores 0.73 & .72 in training data set and validation data set respectively

    In [138]:
    sns.set(style="darkgrid", color_codes=True)
                
    with sns.axes_style("white"):
        sns.jointplot(x=y_val, y=y_LR1_predvl, kind="reg", color="k")
    

    Lasso model

    In [139]:
    Lasso1 = Lasso(alpha=1)
    Lasso1.fit(X_train, y_train)
    
    #predicting result over test data
    y_Lasso1_predtr= Lasso1.predict(X_train)
    y_Lasso1_predvl= Lasso1.predict(X_val)
    
    Lasso1.coef_
    
    C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:492: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
      ConvergenceWarning)
    
    Out[139]:
    array([ 9.65902931e+01, -3.93953142e+00,  1.35529095e+01, -2.29061776e+01,
           -2.04163675e+03,  5.35553554e+01, -1.85807486e+00, -1.60949858e+00,
            1.73465396e+02,  2.58275091e+04,  4.11967331e+04,  3.40052776e+04,
            1.64885732e+04,  7.61896026e+03, -2.86912815e+04, -9.69275851e+04,
           -1.65210854e+04, -4.12121513e+03, -1.00992716e+05, -1.71298527e+05,
           -9.13241618e+04, -2.48925268e+04, -1.60169381e+04,  1.18516529e+05,
           -1.64860326e+04, -6.79525494e+03,  1.46220219e+03,  5.75895038e+03,
            5.60351070e+03,  6.32472345e+03,  2.34670403e+04,  7.06751057e+04,
            5.73457320e+04,  1.78131982e+05,  6.38573909e+04,  5.27207711e+04,
            6.65775444e+04,  8.00797041e+05,  4.05485728e+04,  1.84114085e+05,
            0.00000000e+00,  1.64275884e+04,  1.53464259e+04,  5.53276759e+04,
            5.70768344e+04,  7.27284380e+04,  2.60559411e+05,  3.99774418e+04,
            4.57348433e+04,  1.17367493e+05,  2.55764846e+05,  9.44643505e+04,
            1.25980055e+05,  1.53729964e+05,  1.91681487e+05, -1.98077420e+05,
           -2.00695519e+05, -1.80295562e+05, -1.03865243e+05, -3.58821013e+04,
            5.98641869e+04,  1.85986501e+05,  4.26066854e+05,  1.35396787e+06,
            3.18911457e+04,  2.94465826e+05,  1.31572941e+05,  1.23199746e+05,
            1.75754190e+05,  1.08625950e+05,  1.76695808e+04,  1.70928679e+05,
           -1.38642363e+04,  1.66073186e+05,  1.20696485e+05,  1.45879439e+04,
            2.33538555e+05,  4.03015196e+04,  8.01308223e+05,  4.19176853e+05,
            1.32466072e+05,  2.32091780e+05,  5.96809785e+04,  1.60725157e+05,
            1.72962017e+05,  1.03839270e+05,  8.54647044e+04,  1.55943713e+05,
            2.84495484e+04,  3.11881238e+04])
    In [140]:
    #Model score and Deduction for each Model in a DataFrame
    Lasso1_trscore=r2_score(y_train,y_Lasso1_predtr)
    Lasso1_trRMSE=np.sqrt(mean_squared_error(y_train, y_Lasso1_predtr))
    Lasso1_trMSE=mean_squared_error(y_train, y_Lasso1_predtr)
    Lasso1_trMAE=mean_absolute_error(y_train, y_Lasso1_predtr)
    
    Lasso1_vlscore=r2_score(y_val,y_Lasso1_predvl)
    Lasso1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_Lasso1_predvl))
    Lasso1_vlMSE=mean_squared_error(y_val, y_Lasso1_predvl)
    Lasso1_vlMAE=mean_absolute_error(y_val, y_Lasso1_predvl)
    
    Lasso1_df=pd.DataFrame({'Method':['Linear-Reg Lasso1'],'Val Score':Lasso1_vlscore,'RMSE_vl': Lasso1_vlRMSE, 'MSE_vl': Lasso1_vlMSE, 'MAE_vl': Lasso1_vlMAE,'train Score':Lasso1_trscore,'RMSE_tr': Lasso1_trRMSE, 'MSE_tr': Lasso1_trMSE, 'MAE_tr': Lasso1_trMAE})
    Compa_df = pd.concat([Compa_df, Lasso1_df])
    
    Compa_df
    
    Out[140]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

    The lasso linear regression model performed with scores 0.73 & .72 in training data set and validation data set respectively. The coefficeints of 1 variable in lasso model is almost '0', signifying that the variable with '0' coefficient can be dropped.

    In [141]:
    sns.set(style="darkgrid", color_codes=True)
                
    with sns.axes_style("white"):
        sns.jointplot(x=y_val, y=y_Lasso1_predvl, kind="reg", color="k")
    

    Ridge model

    In [142]:
    Ridge1 = Ridge(alpha=0.5)
    Ridge1.fit(X_train, y_train)
    
    #predicting result over test data
    y_Ridge1_predtr= Ridge1.predict(X_train)
    y_Ridge1_predvl= Ridge1.predict(X_val)
    
    Ridge1.coef_
    
    Out[142]:
    array([ 4.66834622e+01, -2.60918244e+01,  4.15530007e+01,  5.13138900e+00,
           -2.04329070e+03,  5.40305072e+01, -1.83894732e+00,  2.05899911e+01,
            1.99149390e+02,  4.75922037e+04,  6.34342640e+04,  5.62050467e+04,
            3.84394798e+04,  2.95688617e+04, -6.95633001e+03, -7.29503951e+04,
            2.07847666e+03,  1.15197429e+04, -6.08136975e+04, -1.07506532e+05,
           -1.24763131e+05, -6.99157975e+04, -6.12127626e+04,  6.84637990e+04,
           -6.18022272e+04, -5.21975799e+04, -4.39150711e+04, -3.96547935e+04,
           -4.00644775e+04, -3.93829165e+04, -2.22301737e+04,  2.63190006e+04,
            1.13660948e+04,  1.31445711e+05,  1.79480665e+04,  7.08429226e+03,
            2.07235857e+04,  5.06032623e+05,  1.96912226e+03,  1.26354803e+05,
            0.00000000e+00,  1.62595610e+04,  1.52860502e+04,  5.48436035e+04,
            5.68361232e+04,  6.73075467e+04,  2.58145115e+05,  3.96081664e+04,
            4.58498930e+04,  1.16570206e+05,  2.54669644e+05,  8.12334931e+04,
            1.12963518e+05,  1.40760828e+05,  1.78571191e+05, -1.31486307e+05,
           -1.40442366e+05, -1.20195568e+05, -4.38816320e+04,  2.39956530e+04,
           -2.60492990e+05, -1.34329618e+05,  1.11752710e+05,  6.95080119e+05,
            4.12010220e+05,  2.90368008e+05,  1.26135853e+05,  1.18985675e+05,
            1.69604765e+05,  1.04432882e+05,  1.42752316e+04,  1.63147681e+05,
           -1.74612788e+04,  1.62055462e+05,  1.16592568e+05,  1.10585170e+04,
            2.29612234e+05,  3.67503018e+04,  7.71949916e+05,  4.13369607e+05,
            1.28389461e+05,  2.28151004e+05,  5.60163760e+04,  1.56620270e+05,
            1.69467429e+05,  9.98907258e+04,  8.00564608e+04,  1.51440054e+05,
            2.85123132e+04,  3.11686377e+04])
    In [143]:
    #Model score and Deduction for each Model in a DataFrame
    Ridge1_trscore=r2_score(y_train,y_Ridge1_predtr)
    Ridge1_trRMSE=np.sqrt(mean_squared_error(y_train, y_Ridge1_predtr))
    Ridge1_trMSE=mean_squared_error(y_train, y_Ridge1_predtr)
    Ridge1_trMAE=mean_absolute_error(y_train, y_Ridge1_predtr)
    
    Ridge1_vlscore=r2_score(y_val,y_Ridge1_predvl)
    Ridge1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_Ridge1_predvl))
    Ridge1_vlMSE=mean_squared_error(y_val, y_Ridge1_predvl)
    Ridge1_vlMAE=mean_absolute_error(y_val, y_Ridge1_predvl)
    
    Ridge1_df=pd.DataFrame({'Method':['Linear-Reg Ridge1'],'Val Score':Ridge1_vlscore,'RMSE_vl': Ridge1_vlRMSE, 'MSE_vl': Ridge1_vlMSE, 'MAE_vl': Ridge1_vlMAE,'train Score':Ridge1_trscore,'RMSE_tr': Ridge1_trRMSE, 'MSE_tr': Ridge1_trMSE, 'MAE_tr': Ridge1_trMAE})
    Compa_df = pd.concat([Compa_df, Ridge1_df])
    
    Compa_df
    
    Out[143]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
    0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

    The Ridge linear regression model performed with scores 0.73 & .72 in training data set and validation data set respectively. The coefficeints of variables in ridge model are all non-zero, indicating that non of the variables can be dropped.

    In [144]:
    sns.set(style="darkgrid", color_codes=True)
                
    with sns.axes_style("white"):
        sns.jointplot(x=y_val, y=y_Ridge1_predvl, kind="reg", color="k")
    

    In summary, Linear models have performed almost with similar results in both regularized model and non-regularized models

    KNN Regressor

    In [145]:
    from sklearn.neighbors import KNeighborsRegressor
    
    In [146]:
    knn1 = KNeighborsRegressor(n_neighbors=4,weights='distance')
    knn1.fit(X_train, y_train)
    
    #predicting result over test data
    y_knn1_predtr= knn1.predict(X_train)
    y_knn1_predvl= knn1.predict(X_val)
    
    In [147]:
    #Model score and Deduction for each Model in a DataFrame
    knn1_trscore=r2_score(y_train,y_knn1_predtr)
    knn1_trRMSE=np.sqrt(mean_squared_error(y_train, y_knn1_predtr))
    knn1_trMSE=mean_squared_error(y_train, y_knn1_predtr)
    knn1_trMAE=mean_absolute_error(y_train, y_knn1_predtr)
    
    knn1_vlscore=r2_score(y_val,y_knn1_predvl)
    knn1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_knn1_predvl))
    knn1_vlMSE=mean_squared_error(y_val, y_knn1_predvl)
    knn1_vlMAE=mean_absolute_error(y_val, y_knn1_predvl)
    
    knn1_df=pd.DataFrame({'Method':['knn1'],'Val Score':knn1_vlscore,'RMSE_vl': knn1_vlRMSE, 'MSE_vl': knn1_vlMSE, 'MAE_vl': knn1_vlMAE,'train Score':knn1_trscore,'RMSE_tr': knn1_trRMSE, 'MSE_tr': knn1_trMSE, 'MAE_tr': knn1_trMAE})
    Compa_df = pd.concat([Compa_df, knn1_df])
    
    Compa_df
    
    Out[147]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
    0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174
    0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707

    Though KNN regressor performed well in training set, the performance score in validation set is very less. This shows that the model is overfitted in training set

    Support vector regressor

    In [148]:
    from sklearn.svm import SVR
    
    In [149]:
    SVR1 = SVR(gamma='auto',C=10.0, epsilon=0.2,kernel='rbf')
    SVR1.fit(X_train, y_train)
    
    y_SVR1_predtr= SVR1.predict(X_train)
    y_SVR1_predvl= SVR1.predict(X_val)
    
    In [150]:
    #Model score and Deduction for each Model in a DataFrame
    SVR1_trscore=r2_score(y_train,y_SVR1_predtr)
    SVR1_trRMSE=np.sqrt(mean_squared_error(y_train, y_SVR1_predtr))
    SVR1_trMSE=mean_squared_error(y_train, y_SVR1_predtr)
    SVR1_trMAE=mean_absolute_error(y_train, y_SVR1_predtr)
    
    SVR1_vlscore=r2_score(y_val,y_SVR1_predvl)
    SVR1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_SVR1_predvl))
    SVR1_vlMSE=mean_squared_error(y_val, y_SVR1_predvl)
    SVR1_vlMAE=mean_absolute_error(y_val, y_SVR1_predvl)
    
    SVR1_df=pd.DataFrame({'Method':['SVR1'],'Val Score':SVR1_vlscore,'RMSE_vl': SVR1_vlRMSE, 'MSE_vl': SVR1_vlMSE, 'MAE_vl': SVR1_vlMAE,'train Score':SVR1_trscore,'RMSE_tr': SVR1_trRMSE, 'MSE_tr': SVR1_trMSE, 'MAE_tr': SVR1_trMAE})
    Compa_df = pd.concat([Compa_df, SVR1_df])
    
    Compa_df
    
    Out[150]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
    0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174
    0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
    0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170

    The above negative scores in SVR model is due to non-learning of the model in the training set which results in non-performance in validation set

    In [151]:
    SVR2 = SVR(gamma='auto',C=0.1,kernel='linear')
    SVR2.fit(X_train, y_train)
    
    y_SVR2_predtr= SVR2.predict(X_train)
    y_SVR2_predvl= SVR2.predict(X_val)
    
    #Model score and Deduction for each Model in a DataFrame
    SVR2_trscore=r2_score(y_train,y_SVR2_predtr)
    SVR2_trRMSE=np.sqrt(mean_squared_error(y_train, y_SVR2_predtr))
    SVR2_trMSE=mean_squared_error(y_train, y_SVR2_predtr)
    SVR2_trMAE=mean_absolute_error(y_train, y_SVR2_predtr)
    
    SVR2_vlscore=r2_score(y_val,y_SVR2_predvl)
    SVR2_vlRMSE=np.sqrt(mean_squared_error(y_val, y_SVR2_predvl))
    SVR2_vlMSE=mean_squared_error(y_val, y_SVR2_predvl)
    SVR2_vlMAE=mean_absolute_error(y_val, y_SVR2_predvl)
    
    SVR2_df=pd.DataFrame({'Method':['SVR2'],'Val Score':SVR2_vlscore,'RMSE_vl': SVR2_vlRMSE, 'MSE_vl': SVR2_vlMSE, 'MAE_vl': SVR2_vlMAE,'train Score':SVR2_trscore,'RMSE_tr': SVR2_trRMSE, 'MSE_tr': SVR2_trMSE, 'MAE_tr': SVR2_trMAE})
    Compa_df = pd.concat([Compa_df, SVR2_df])
    
    Compa_df
    
    Out[151]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
    0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174
    0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
    0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
    0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504

    The SVR model with modified parameters has not performed well with just ~0.45 in both training and validation data sets

    Decision Tree Regressor

    In [152]:
    from sklearn.tree import DecisionTreeRegressor
    
    In [153]:
    DT1 = DecisionTreeRegressor()
    DT1.fit(X_train, y_train)
    
    y_DT1_predtr= DT1.predict(X_train)
    y_DT1_predvl= DT1.predict(X_val)
    
    #Model score and Deduction for each Model in a DataFrame
    DT1_trscore=r2_score(y_train,y_DT1_predtr)
    DT1_trRMSE=np.sqrt(mean_squared_error(y_train, y_DT1_predtr))
    DT1_trMSE=mean_squared_error(y_train, y_DT1_predtr)
    DT1_trMAE=mean_absolute_error(y_train, y_DT1_predtr)
    
    DT1_vlscore=r2_score(y_val,y_DT1_predvl)
    DT1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_DT1_predvl))
    DT1_vlMSE=mean_squared_error(y_val, y_DT1_predvl)
    DT1_vlMAE=mean_absolute_error(y_val, y_DT1_predvl)
    
    DT1_df=pd.DataFrame({'Method':['DT1'],'Val Score':DT1_vlscore,'RMSE_vl': DT1_vlRMSE, 'MSE_vl': DT1_vlMSE, 'MAE_vl': DT1_vlMAE,'train Score':DT1_trscore,'RMSE_tr': DT1_trRMSE, 'MSE_tr': DT1_trMSE, 'MAE_tr': DT1_trMAE})
    Compa_df = pd.concat([Compa_df, DT1_df])
    
    Compa_df
    
    Out[153]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
    0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174
    0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
    0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
    0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
    0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707

    Above performance of initial Decision tree model shows overfit in training set with 0.99 score and low performance in validation set

    In [154]:
    DT2 = DecisionTreeRegressor(max_depth=10,min_samples_leaf=5)
    DT2.fit(X_train, y_train)
    
    y_DT2_predtr= DT2.predict(X_train)
    y_DT2_predvl= DT2.predict(X_val)
    
    #Model score and Deduction for each Model in a DataFrame
    DT2_trscore=r2_score(y_train,y_DT2_predtr)
    DT2_trRMSE=np.sqrt(mean_squared_error(y_train, y_DT2_predtr))
    DT2_trMSE=mean_squared_error(y_train, y_DT2_predtr)
    DT2_trMAE=mean_absolute_error(y_train, y_DT2_predtr)
    
    DT2_vlscore=r2_score(y_val,y_DT2_predvl)
    DT2_vlRMSE=np.sqrt(mean_squared_error(y_val, y_DT2_predvl))
    DT2_vlMSE=mean_squared_error(y_val, y_DT2_predvl)
    DT2_vlMAE=mean_absolute_error(y_val, y_DT2_predvl)
    
    DT2_df=pd.DataFrame({'Method':['DT2'],'Val Score':DT2_vlscore,'RMSE_vl': DT2_vlRMSE, 'MSE_vl': DT2_vlMSE, 'MAE_vl': DT2_vlMAE,'train Score':DT2_trscore,'RMSE_tr': DT2_trRMSE, 'MSE_tr': DT2_trMSE, 'MAE_tr': DT2_trMAE})
    Compa_df = pd.concat([Compa_df, DT2_df])
    
    Compa_df
    
    Out[154]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
    0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174
    0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
    0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
    0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
    0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707
    0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190

    Above decision tree model with modified parameter has better performed on the training set and validation set compared to initial decision tree model.But overall decision tree has not performed well than linear regression models.

    In [155]:
    sns.set(style="darkgrid", color_codes=True)
                
    with sns.axes_style("white"):
        sns.jointplot(x=y_val, y=y_DT2_predvl, kind="reg", color="k")
    

    In summary, KNN regressor model and decision tree models have not performed well in comparison with linear regression models

    Ensemble techniques

    Boosting and Bagging

    In [156]:
    from sklearn.ensemble import GradientBoostingRegressor, BaggingRegressor
    
    In [157]:
    GB1=GradientBoostingRegressor(n_estimators = 200, learning_rate = 0.1, random_state=22)
    GB1.fit(X_train, y_train)
    
    y_GB1_predtr= GB1.predict(X_train)
    y_GB1_predvl= GB1.predict(X_val)
    
    #Model score and Deduction for each Model in a DataFrame
    GB1_trscore=r2_score(y_train,y_GB1_predtr)
    GB1_trRMSE=np.sqrt(mean_squared_error(y_train, y_GB1_predtr))
    GB1_trMSE=mean_squared_error(y_train, y_GB1_predtr)
    GB1_trMAE=mean_absolute_error(y_train, y_GB1_predtr)
    
    GB1_vlscore=r2_score(y_val,y_GB1_predvl)
    GB1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_GB1_predvl))
    GB1_vlMSE=mean_squared_error(y_val, y_GB1_predvl)
    GB1_vlMAE=mean_absolute_error(y_val, y_GB1_predvl)
    
    GB1_df=pd.DataFrame({'Method':['GB1'],'Val Score':GB1_vlscore,'RMSE_vl': GB1_vlRMSE, 'MSE_vl': GB1_vlMSE, 'MAE_vl': GB1_vlMAE,'train Score':GB1_trscore,'RMSE_tr': GB1_trRMSE, 'MSE_tr': GB1_trMSE, 'MAE_tr': GB1_trMAE})
    Compa_df = pd.concat([Compa_df, GB1_df])
    
    Compa_df
    
    Out[157]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
    0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174
    0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
    0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
    0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
    0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707
    0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190
    0 GB1 0.782471 121129.989228 1.467247e+10 82824.319932 0.820821 108334.766538 1.173642e+10 76533.619644

    Gradient boosting model has provided good scores in both training and validation sets

    In [158]:
    BGG1=BaggingRegressor(n_estimators=50, oob_score= True,random_state=14)
    BGG1.fit(X_train, y_train)
    
    y_BGG1_predtr= BGG1.predict(X_train)
    y_BGG1_predvl= BGG1.predict(X_val)
    
    #Model score and Deduction for each Model in a DataFrame
    BGG1_trscore=r2_score(y_train,y_BGG1_predtr)
    BGG1_trRMSE=np.sqrt(mean_squared_error(y_train, y_BGG1_predtr))
    BGG1_trMSE=mean_squared_error(y_train, y_BGG1_predtr)
    BGG1_trMAE=mean_absolute_error(y_train, y_BGG1_predtr)
    
    BGG1_vlscore=r2_score(y_val,y_BGG1_predvl)
    BGG1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_BGG1_predvl))
    BGG1_vlMSE=mean_squared_error(y_val, y_BGG1_predvl)
    BGG1_vlMAE=mean_absolute_error(y_val, y_BGG1_predvl)
    
    BGG1_df=pd.DataFrame({'Method':['BGG1'],'Val Score':BGG1_vlscore,'RMSE_vl': BGG1_vlRMSE, 'MSE_vl':BGG1_vlMSE, 'MAE_vl': BGG1_vlMAE,'train Score':BGG1_trscore,'RMSE_tr': BGG1_trRMSE, 'MSE_tr': BGG1_trMSE, 'MAE_tr': BGG1_trMAE})
    Compa_df = pd.concat([Compa_df, BGG1_df])
    
    Compa_df
    
    Out[158]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
    0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174
    0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
    0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
    0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
    0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707
    0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190
    0 GB1 0.782471 121129.989228 1.467247e+10 82824.319932 0.820821 108334.766538 1.173642e+10 76533.619644
    0 BGG1 0.769319 124738.101557 1.555959e+10 80102.360544 0.966466 46867.181534 2.196533e+09 29441.780117

    Bagging model also performed well in training and validation sets.There seems to be overfitting in training set. We need to analyse further by hypertuning

    Random forest

    In [159]:
    from sklearn.ensemble import RandomForestRegressor
    
    In [160]:
    RF1=RandomForestRegressor()
    RF1.fit(X_train, y_train)
    
    y_RF1_predtr= RF1.predict(X_train)
    y_RF1_predvl= RF1.predict(X_val)
    
    #Model score and Deduction for each Model in a DataFrame
    RF1_trscore=r2_score(y_train,y_RF1_predtr)
    RF1_trRMSE=np.sqrt(mean_squared_error(y_train, y_RF1_predtr))
    RF1_trMSE=mean_squared_error(y_train, y_RF1_predtr)
    RF1_trMAE=mean_absolute_error(y_train, y_RF1_predtr)
    
    RF1_vlscore=r2_score(y_val,y_RF1_predvl)
    RF1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_RF1_predvl))
    RF1_vlMSE=mean_squared_error(y_val, y_RF1_predvl)
    RF1_vlMAE=mean_absolute_error(y_val, y_RF1_predvl)
    
    RF1_df=pd.DataFrame({'Method':['RF1'],'Val Score':RF1_vlscore,'RMSE_vl': RF1_vlRMSE, 'MSE_vl':RF1_vlMSE, 'MAE_vl': RF1_vlMAE,'train Score':RF1_trscore,'RMSE_tr': RF1_trRMSE, 'MSE_tr': RF1_trMSE, 'MAE_tr': RF1_trMAE})
    Compa_df = pd.concat([Compa_df, RF1_df])
    
    Compa_df
    
    Out[160]:
    Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
    0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
    0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174
    0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
    0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
    0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
    0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707
    0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190
    0 GB1 0.782471 121129.989228 1.467247e+10 82824.319932 0.820821 108334.766538 1.173642e+10 76533.619644
    0 BGG1 0.769319 124738.101557 1.555959e+10 80102.360544 0.966466 46867.181534 2.196533e+09 29441.780117
    0 RF1 0.754483 128686.871977 1.656031e+10 82901.717082 0.954362 54674.891629 2.989344e+09 33099.588581

    Random forest model has performed well in training and validation set. There is scope of further analysis on this model

    Enseble models: in summary ensemble models have performed well on training and validation sets. These models will be selected for further analysis with hypertuning and feature selection
    In [161]:
    #feature importance
    rf_imp_feature_1=pd.DataFrame(RF1.feature_importances_, columns = ["Imp"], index = X_val.columns)
    rf_imp_feature_1.sort_values(by="Imp",ascending=False)
    rf_imp_feature_1['Imp'] = rf_imp_feature_1['Imp'].map('{0:.5f}'.format)
    rf_imp_feature_1=rf_imp_feature_1.sort_values(by="Imp",ascending=False)
    rf_imp_feature_1.Imp=rf_imp_feature_1.Imp.astype("float")
    
    rf_imp_feature_1[:30].plot.bar(figsize=(plotSizeX, plotSizeY))
    
    #First 20 features have an importance of 90.5% and first 30 have importance of 95.15
    print("First 20 feature importance:\t",(rf_imp_feature_1[:20].sum())*100)
    print("First 30 feature importance:\t",(rf_imp_feature_1[:30].sum())*100)
    
    First 20 feature importance:	 Imp    90.184
    dtype: float64
    First 30 feature importance:	 Imp    95.098
    dtype: float64
    

    Above are top 30 important features that account for 95% of variation in model. This need to be further analysed during hypertuning of the models for better scores

    Model performance Summary:

    Ensemble methods are performing better than linear models. Of all the ensemble models, Gradient boosting regressor is giving better R2 score. we identified top 30 features that are explaining the 95% variation in model(Random Forest). Will further hypertune the model to improve the model performance. Will further explore and evaluate the features while hyperturning the ensemble models

    Building Function/Pipeline for models

    In [162]:
    rf_imp_feature_1[:30]
    
    Out[162]:
    Imp
    furnished_1 0.28448
    yr_built 0.14227
    living_measure 0.09463
    living_measure15 0.06691
    quality_8 0.05062
    HouseLandRatio 0.04008
    lot_measure15 0.03731
    City_Bellevue 0.02532
    ceil_measure 0.02459
    quality_9 0.02049
    total_area 0.01527
    lot_measure 0.01319
    City_Seattle 0.01268
    City_Kirkland 0.01245
    City_Federal Way 0.01224
    City_Kent 0.01089
    City_Mercer Island 0.01047
    sight_4 0.00945
    quality_7 0.00942
    basement 0.00908
    City_Redmond 0.00830
    coast_1 0.00648
    City_Medina 0.00556
    quality_10 0.00545
    City_Renton 0.00521
    room_bed_4 0.00393
    City_Maple Valley 0.00388
    City_Sammamish 0.00379
    sight_3 0.00351
    City_Issaquah 0.00303
    In [163]:
    from sklearn.pipeline import Pipeline
    
    In [164]:
    def result (model,pipe_model,X_train_set,y_train_set,X_val_set,y_val_set):
        pipe_model.fit(X_train_set,y_train_set)
        #predicting result over test data
        y_train_predict= pipe_model.predict(X_train_set)
        y_val_predict= pipe_model.predict(X_val_set)
    
        trscore=r2_score(y_train_set,y_train_predict)
        trRMSE=np.sqrt(mean_squared_error(y_train_set,y_train_predict))
        trMSE=mean_squared_error(y_train_set,y_train_predict)
        trMAE=mean_absolute_error(y_train_set,y_train_predict)
    
        vlscore=r2_score(y_val,y_val_predict)
        vlRMSE=np.sqrt(mean_squared_error(y_val,y_val_predict))
        vlMSE=mean_squared_error(y_val,y_val_predict)
        vlMAE=mean_absolute_error(y_val,y_val_predict)
        result_df=pd.DataFrame({'Method':[model],'val score':vlscore,'RMSE_val':vlRMSE,'MSE_val':vlMSE,'MSE_vl': vlMSE,
                              'train Score':trscore,'RMSE_tr': trRMSE,'MSE_tr': trMSE, 'MAE_tr': trMAE})  
        return result_df
    

    Above function will run the model and return the r2 score,rmse,mse of the model

    In [165]:
    #Creating empty dataframe to capture results
    result_dff=pd.DataFrame()
    pipe_LR = Pipeline([('LR', LinearRegression())])
    result_dff=pd.concat([result_dff,result('LR',pipe_LR,X_train,y_train,X_val,y_val)])
    
    pipe_knr = Pipeline([('KNNR', KNeighborsRegressor(n_neighbors=4,weights='distance'))])
    result_dff=pd.concat([result_dff,result('KNNR',pipe_knr,X_train,y_train,X_val,y_val)])
    
    pipe_DTR = Pipeline([('DTR', DecisionTreeRegressor())])
    result_dff=pd.concat([result_dff,result('DTR',pipe_DTR,X_train,y_train,X_val,y_val)])
    
    pipe_GBR = Pipeline([('GBR', GradientBoostingRegressor(n_estimators = 200, learning_rate = 0.1, random_state=22))])
    result_dff=pd.concat([result_dff,result('GBR',pipe_GBR,X_train,y_train,X_val,y_val)])
    
    pipe_BGR = Pipeline([('BGR', BaggingRegressor(n_estimators=50, oob_score= True,random_state=14))])
    result_dff=pd.concat([result_dff,result('BGR',pipe_BGR,X_train,y_train,X_val,y_val)])
    
    pipe_RFR = Pipeline([('RFR', RandomForestRegressor())])
    result_dff=pd.concat([result_dff,result('RFR',pipe_RFR,X_train,y_train,X_val,y_val)])
    
    result_dff
    
    Out[165]:
    Method val score RMSE_val MSE_val MSE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 LR 0.718749 137733.698415 1.897057e+10 1.897057e+10 0.730112 132958.367261 1.767793e+10 92391.001786
    0 KNNR 0.425008 196935.451160 3.878357e+10 3.878357e+10 0.998628 9480.192071 8.987404e+07 887.708707
    0 DTR 0.537219 176677.375867 3.121490e+10 3.121490e+10 0.998628 9480.192071 8.987404e+07 887.708707
    0 GBR 0.782471 121129.989228 1.467247e+10 1.467247e+10 0.820821 108334.766538 1.173642e+10 76533.619644
    0 BGR 0.769319 124738.101557 1.555959e+10 1.555959e+10 0.966466 46867.181534 2.196533e+09 29441.780117
    0 RFR 0.757473 127900.773592 1.635861e+10 1.635861e+10 0.955380 54061.682258 2.922665e+09 32834.525684

    Above sequence of steps with pipeline function will run all the models and compile the scores in result_dff dataframe. We can see that the above 2 steps are concise instead of running individual models and compiling the scores as earlier.

    We can clearly see gradient boosting is giving better result in comparison with other ensemble methods. Also the score of 0.82 on training set indicates no overfitting of the model

    In [166]:
    #Storing results of initial data set - dff
    
    result_ds1=result_dff.copy()
    result_ds1
    
    Out[166]:
    Method val score RMSE_val MSE_val MSE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 LR 0.718749 137733.698415 1.897057e+10 1.897057e+10 0.730112 132958.367261 1.767793e+10 92391.001786
    0 KNNR 0.425008 196935.451160 3.878357e+10 3.878357e+10 0.998628 9480.192071 8.987404e+07 887.708707
    0 DTR 0.537219 176677.375867 3.121490e+10 3.121490e+10 0.998628 9480.192071 8.987404e+07 887.708707
    0 GBR 0.782471 121129.989228 1.467247e+10 1.467247e+10 0.820821 108334.766538 1.173642e+10 76533.619644
    0 BGR 0.769319 124738.101557 1.555959e+10 1.555959e+10 0.966466 46867.181534 2.196533e+09 29441.780117
    0 RFR 0.757473 127900.773592 1.635861e+10 1.635861e+10 0.955380 54061.682258 2.922665e+09 32834.525684

    FEATURE SELECTION (PCA)

    Now, we will explore the possibility of features reduction using PCA

    In [167]:
    dff.shape
    
    Out[167]:
    (18287, 91)
    In [168]:
    dff.columns
    
    Out[168]:
    Index(['price', 'living_measure', 'lot_measure', 'ceil_measure', 'basement',
           'yr_built', 'living_measure15', 'lot_measure15', 'total_area',
           'HouseLandRatio', 'room_bed_1', 'room_bed_2', 'room_bed_3',
           'room_bed_4', 'room_bed_5', 'room_bed_6', 'room_bed_7', 'room_bed_8',
           'room_bed_9', 'room_bed_10', 'room_bed_11', 'room_bath_0.5',
           'room_bath_0.75', 'room_bath_1.0', 'room_bath_1.25', 'room_bath_1.5',
           'room_bath_1.75', 'room_bath_2.0', 'room_bath_2.25', 'room_bath_2.5',
           'room_bath_2.75', 'room_bath_3.0', 'room_bath_3.25', 'room_bath_3.5',
           'room_bath_3.75', 'room_bath_4.0', 'room_bath_4.25', 'room_bath_4.5',
           'room_bath_4.75', 'room_bath_5.0', 'room_bath_5.25', 'room_bath_5.75',
           'ceil_1.5', 'ceil_2.0', 'ceil_2.5', 'ceil_3.0', 'ceil_3.5', 'coast_1',
           'sight_1', 'sight_2', 'sight_3', 'sight_4', 'condition_2',
           'condition_3', 'condition_4', 'condition_5', 'quality_4', 'quality_5',
           'quality_6', 'quality_7', 'quality_8', 'quality_9', 'quality_10',
           'quality_11', 'quality_12', 'furnished_1', 'City_Bellevue',
           'City_Black Diamond', 'City_Bothell', 'City_Carnation', 'City_Duvall',
           'City_Enumclaw', 'City_Fall City', 'City_Federal Way', 'City_Issaquah',
           'City_Kenmore', 'City_Kent', 'City_Kirkland', 'City_Maple Valley',
           'City_Medina', 'City_Mercer Island', 'City_North Bend', 'City_Redmond',
           'City_Renton', 'City_Sammamish', 'City_Seattle', 'City_Snoqualmie',
           'City_Vashon', 'City_Woodinville', 'has_basement_Yes',
           'has_renovated_Yes'],
          dtype='object')

    will drop the price column as it is the target variable

    In [169]:
    df_pca = dff.drop(['price'], axis = 1)
    
    In [170]:
    numerical_cols = df_pca.copy()
    
    numerical_cols.shape
    
    Out[170]:
    (18287, 90)
    In [171]:
    # Let's first transform the entire X (independent variable data) to zscores. 
    # We will create the PCA dimensions on this distribution.
    from scipy.stats import zscore
    
    # As PCA for Independent columns of Numerical types, let's pass numerical_cols (16 numerical features)
    numerical_cols =  numerical_cols.apply(zscore)      
    
    cov_matrix = np.cov(numerical_cols.T)
    print('Covariance Matrix \n%s', cov_matrix)
    
    Covariance Matrix 
    %s [[ 1.00005469  0.20028185  0.84597846 ...  0.01415428  0.20094885
       0.05257785]
     [ 0.20028185  1.00005469  0.1663024  ...  0.08035946 -0.02988448
      -0.00617414]
     [ 0.84597846  0.1663024   1.00005469 ...  0.01649371 -0.27730605
       0.01739462]
     ...
     [ 0.01415428  0.08035946  0.01649371 ...  1.00005469 -0.0056238
      -0.01445085]
     [ 0.20094885 -0.02988448 -0.27730605 ... -0.0056238   1.00005469
       0.04524435]
     [ 0.05257785 -0.00617414  0.01739462 ... -0.01445085  0.04524435
       1.00005469]]
    

    As we can see, near the value to 1, more the features related.

    In [172]:
    eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
    print('Eigen Vectors \n%s', eigenvectors)
    print('\n Eigen Values \n%s', eigenvalues)
    
    Eigen Vectors 
    %s [[ 3.38140157e-01 -5.91272225e-02  2.10933458e-01 ... -5.27282174e-03
       5.54142192e-03 -1.67124034e-04]
     [ 7.12659835e-02 -4.34121260e-01 -8.88436080e-02 ... -1.68818774e-02
       4.57078107e-03 -8.52995967e-03]
     [ 3.49772357e-01 -8.00383876e-03 -4.05156781e-02 ... -4.68496505e-03
      -1.39683527e-02  2.43243338e-03]
     ...
     [ 1.29688720e-02 -3.80398560e-02 -3.41813657e-02 ...  1.07174062e-01
       8.41737918e-02 -1.95413367e-01]
     [-2.50075399e-02 -3.74622282e-02  4.39476539e-01 ...  3.99982110e-03
       4.08181763e-02  2.32617269e-02]
     [-3.18537004e-03 -6.23000043e-04  1.02661626e-01 ...  2.84579669e-02
      -1.47963772e-02 -6.62692406e-02]]
    
     Eigen Values 
    %s [ 6.40030103e+00  4.23053272e+00  3.02200570e+00  2.36069955e+00
      1.72278028e+00  1.70533047e+00  5.17634008e-02  7.84864255e-02
      1.23323929e-01  1.58239483e+00  1.94704947e-01  2.10588552e-01
      2.45372409e-01  3.37764061e-01  3.52383334e-01  2.24756725e-03
      9.93351422e-04  1.28503648e-04  8.54683326e-05  1.51669793e+00
      3.97816689e-01  1.48400510e+00  4.25049450e-01 -5.20329656e-16
     -1.83229560e-15  3.57406920e-15  1.39212554e+00  1.33812387e+00
      5.71411667e-01  6.48215227e-01  6.60453404e-01  1.27455883e+00
      6.90208644e-01  7.30900855e-01  1.22358633e+00  1.21781188e+00
      7.54916613e-01  7.61951753e-01  7.89272221e-01  1.19439921e+00
      1.18354682e+00  8.08765828e-01  8.31761100e-01  1.17521503e+00
      1.16073113e+00  8.62975337e-01  1.14847039e+00  8.79158894e-01
      1.11948938e+00  1.10960276e+00  8.90644524e-01  8.88567656e-01
      9.01761603e-01  1.10493861e+00  9.16012433e-01  9.31041146e-01
      1.09143428e+00  1.08460485e+00  1.08273453e+00  1.07118893e+00
      9.33793856e-01  1.06368893e+00  9.41694315e-01  9.44273389e-01
      9.49801385e-01  9.52927340e-01  1.05455290e+00  1.04955645e+00
      1.04815072e+00  1.04163633e+00  9.69808694e-01  1.03813696e+00
      1.03345195e+00  1.02768165e+00  1.02381893e+00  9.82887562e-01
      9.81254198e-01  9.86986081e-01  1.01521697e+00  9.89390243e-01
      9.93680575e-01  9.93261992e-01  1.00195909e+00  1.01363137e+00
      1.01189827e+00  1.00051625e+00  1.00419597e+00  1.00622516e+00
      1.00552678e+00  1.00928050e+00]
    
    In [173]:
    # Let's Sort eigenvalues in descending order
    
    # Make a set of (eigenvalue, eigenvector) pairs
    eig_pairs = [(eigenvalues[index], eigenvectors[:,index]) for index in range(len(eigenvalues))]
    
    # Sort the (eigenvalue, eigenvector) pairs from highest to lowest with respect to eigenvalue
    eig_pairs.sort()
    
    eig_pairs.reverse()
    print(eig_pairs)
    
    # Extract the descending ordered eigenvalues and eigenvectors
    eigvalues_sorted = [eig_pairs[index][0] for index in range(len(eigenvalues))]
    eigvectors_sorted = [eig_pairs[index][1] for index in range(len(eigenvalues))]
    
    # Let's confirm our sorting worked, print out eigenvalues
    print('Eigenvalues in descending order: \n%s' %eigvalues_sorted)
    
    [(6.400301029851477, array([ 3.38140157e-01,  7.12659835e-02,  3.49772357e-01,  1.21826278e-02,
            2.45688226e-01,  3.13509594e-01,  5.85711636e-02,  1.32118243e-01,
            1.24188241e-01, -5.39557358e-02, -1.55322767e-01, -7.38082454e-02,
            1.69280354e-01,  6.71341866e-02,  1.76807150e-02,  8.40382554e-03,
            2.25638942e-03,  3.61933268e-03,  1.42331836e-03,  5.17260476e-04,
           -5.63803984e-03, -2.85596080e-02, -2.20914937e-01, -2.56073225e-03,
           -6.59873013e-02, -6.94849495e-02, -5.10691490e-02,  2.99009064e-02,
            1.96828062e-01,  7.27758185e-02,  4.47037234e-02,  6.43544191e-02,
            8.98255374e-02,  3.57441062e-02,  2.39042699e-02,  1.82629160e-02,
            2.35903040e-02,  9.52118613e-03,  6.36445584e-03,  6.71514053e-03,
            4.22319789e-03, -8.25472478e-02,  2.65712150e-01,  2.24314600e-02,
           -1.44008864e-03, -1.59380115e-04,  1.36351362e-02,  1.21584859e-02,
            2.79651538e-02,  3.46589095e-02,  2.87868177e-02, -3.59443998e-02,
            1.23735329e-01, -9.23789546e-02, -5.48045043e-02, -2.04647878e-02,
           -5.44353495e-02, -1.55877412e-01, -1.66909198e-01,  9.78807036e-02,
            2.09856238e-01,  1.36794862e-01,  5.65981168e-02,  8.75831765e-03,
            2.64286457e-01,  4.83149865e-02, -8.63848741e-03,  2.05515336e-02,
           -1.04425520e-02,  2.46487074e-02, -1.36322635e-02, -8.34913899e-03,
            5.58571616e-03,  5.97895799e-02,  1.67219076e-02,  1.26648246e-02,
            1.73286710e-02,  4.02906127e-02,  1.35579146e-02,  4.06324757e-02,
            3.20240017e-03,  5.54293286e-02,  2.79650623e-02,  1.07319938e-01,
           -1.80305747e-01,  5.49209909e-02, -5.88852015e-03,  1.29688720e-02,
           -2.50075399e-02, -3.18537004e-03])), (4.23053271502051, array([-0.05912722, -0.43412126, -0.00800384, -0.09492954,  0.13056396,
           -0.09447739, -0.370925  , -0.42105132,  0.38853903,  0.00818031,
            0.09903886, -0.00322975, -0.05039931, -0.0397953 , -0.01135684,
           -0.00195537,  0.00187349,  0.00561416, -0.00403777,  0.00329149,
            0.00403127, -0.01266106, -0.02264596,  0.00982359,  0.00641089,
           -0.1047631 , -0.01951081, -0.01220764,  0.1019734 , -0.03086617,
            0.0212307 ,  0.04710655,  0.04081646, -0.00117921,  0.00207246,
            0.00066914, -0.00061282, -0.00468101, -0.00095167,  0.00073386,
            0.0008508 , -0.00237293,  0.15398148,  0.03143851,  0.20703993,
            0.0235899 , -0.04163645, -0.02784399, -0.01428115, -0.01787219,
           -0.04086905, -0.02070046,  0.21897162, -0.20078997, -0.05272298,
           -0.00838391, -0.0126264 , -0.02109097, -0.07477947,  0.10406576,
            0.01781302, -0.03510238, -0.02017662, -0.00081656, -0.00543061,
           -0.12229314, -0.02405586, -0.01865954, -0.03518784, -0.00616237,
           -0.02599962, -0.02240527, -0.06022892,  0.0647382 , -0.03107846,
           -0.03563074, -0.05546802,  0.03231613, -0.02937766, -0.07517532,
           -0.03530176,  0.00217373, -0.04298141, -0.03445572,  0.1795959 ,
            0.03400426, -0.04414761, -0.03803986, -0.03746223, -0.000623  ])), (3.0220056962108997, array([ 0.21093346, -0.08884361, -0.04051568,  0.45923331, -0.20948573,
            0.06094638, -0.08769888, -0.04338803,  0.18999279, -0.03429171,
           -0.04303535, -0.14081992,  0.09630094,  0.14413304,  0.08075213,
            0.03087204,  0.0216305 ,  0.02874812,  0.01352604,  0.01099741,
           -0.00046969, -0.0253926 , -0.12810462,  0.00801375, -0.00546132,
            0.08028745,  0.05174082,  0.02460706, -0.17141934,  0.10063382,
            0.10513547,  0.08968952,  0.09751819,  0.04988048,  0.04180795,
            0.0360652 ,  0.04631527,  0.01510422,  0.00364895,  0.00749181,
            0.00784258,  0.14066273, -0.13476605,  0.0508906 ,  0.01324093,
            0.00143509,  0.04433479,  0.0745123 ,  0.11139792,  0.10282392,
            0.08280422, -0.02017733, -0.15120561,  0.08089571,  0.1413166 ,
           -0.03014322, -0.04765138, -0.07345488, -0.00271769,  0.02248996,
            0.03418131,  0.03990794,  0.02663322,  0.01611412,  0.05546425,
            0.03544567, -0.03015266, -0.01667398, -0.0401713 , -0.05250393,
           -0.04129285, -0.02118915, -0.04979088, -0.04044887, -0.021319  ,
           -0.10458144, -0.01955768, -0.10198589,  0.00537009,  0.0366488 ,
           -0.056401  , -0.05468041, -0.08192034, -0.05254563,  0.26984296,
           -0.05563353, -0.00285831, -0.03418137,  0.43947654,  0.10266163])), (2.3606995503606907, array([-0.00720377,  0.00224618,  0.07573216, -0.14547718, -0.20077718,
            0.05062147,  0.01439989,  0.00073809, -0.07253922,  0.14716631,
            0.26402065, -0.27661743,  0.05624118,  0.02404274,  0.00526146,
            0.01216159, -0.00804137,  0.00257074, -0.00579735,  0.0020183 ,
            0.01828875,  0.09307033,  0.32371789,  0.01903027, -0.05286308,
           -0.17577786, -0.06178217, -0.14789745, -0.06749212,  0.02029172,
            0.01273173,  0.05388027,  0.072458  ,  0.05551253,  0.03184281,
            0.03028663,  0.02775791,  0.0149579 , -0.00807084,  0.00573534,
            0.00706833,  0.10759155,  0.00949538,  0.0310457 , -0.05454233,
           -0.00721205,  0.10292123,  0.0113677 ,  0.03206102,  0.04935804,
            0.11456154,  0.0672181 ,  0.02340023, -0.05185769,  0.01734833,
            0.07277432,  0.13346116,  0.2975113 , -0.2129159 , -0.23715036,
            0.24099786,  0.16208952,  0.06896703,  0.00896484,  0.30668776,
           -0.03052305,  0.01273904, -0.03055353,  0.01560851, -0.02733405,
           -0.00267958,  0.02118784, -0.08928568, -0.02238202, -0.04074563,
           -0.10262501, -0.05898896, -0.06153782,  0.01533014,  0.04045805,
           -0.03335646, -0.03231935, -0.03451797,  0.10004965,  0.16962266,
            0.00513287,  0.03516863, -0.02673597, -0.16413819,  0.08959676])), (1.7227802802980245, array([-2.75891943e-02,  6.22663979e-02, -5.87154230e-02,  5.16876256e-02,
            4.84901490e-02,  1.72202394e-02,  1.03722364e-01,  5.34730987e-02,
            2.21522986e-04,  1.75531777e-01,  7.48912126e-02, -2.40660821e-02,
           -6.99276559e-02, -1.38633482e-02,  3.87175860e-02,  1.78451840e-02,
           -6.46767863e-03,  1.31501431e-02, -1.18895907e-02, -2.37987660e-03,
            2.57882460e-02,  2.03529963e-01, -2.22415917e-02,  4.43677778e-02,
           -7.22222733e-02, -9.70041136e-03, -8.15912951e-02,  1.25976552e-01,
            1.12053923e-02, -1.57737740e-02,  7.36220296e-03,  1.20741003e-02,
           -1.82694555e-02, -3.03469013e-03,  2.37479559e-02, -1.61044922e-03,
            6.17820745e-02, -5.39089932e-03,  1.58457297e-02, -9.48287976e-04,
            2.06938171e-02, -1.19344878e-01, -1.05065660e-02, -8.78509029e-03,
            1.16979339e-01,  1.86698328e-02,  3.82937199e-01,  2.08740410e-02,
            1.00421405e-02,  3.70161361e-02,  3.50285374e-01,  4.35659169e-02,
            8.36701642e-02, -5.79924748e-02, -6.94195526e-02,  1.50096333e-01,
            1.05193372e-01,  8.15102561e-02, -3.43044936e-01,  4.55724591e-01,
           -2.36634142e-01, -3.34803092e-02,  3.91109747e-02, -3.62507673e-03,
           -2.18796507e-01,  2.77370246e-02,  1.42157352e-02,  2.65573237e-03,
            5.98256022e-02,  3.65080662e-02, -7.91148709e-03,  3.45710951e-02,
            4.41525866e-02,  3.01403507e-02,  2.24997611e-02, -6.09415475e-02,
           -5.32191501e-02, -3.87528058e-02,  6.06045961e-03,  2.43959995e-02,
            1.73155089e-02,  2.91906845e-02, -1.62743839e-02, -1.11980582e-02,
           -5.78750293e-03, -2.82725675e-02,  1.74298356e-01,  4.41241036e-03,
            5.27653264e-02,  8.14799322e-02])), (1.7053304684180373, array([-0.04304967,  0.03835547, -0.049328  ,  0.00682403,  0.06603838,
           -0.00678647,  0.07094793,  0.0279412 ,  0.01134168, -0.03214826,
           -0.1074207 ,  0.44510897, -0.38919037, -0.0100558 , -0.00243799,
           -0.01259858, -0.00250479,  0.01319458, -0.0099384 ,  0.00592628,
           -0.02191463, -0.00796351, -0.02695272,  0.04413847,  0.07172762,
            0.10503604,  0.00454758, -0.01146529, -0.12597964, -0.10725771,
           -0.01547844,  0.12430449,  0.07892846,  0.04778133,  0.04080463,
            0.0258032 ,  0.044399  ,  0.01198528, -0.00248636,  0.01644248,
           -0.00839736, -0.10453813, -0.05859151,  0.02176298,  0.14312427,
           -0.00625977,  0.25531019,  0.00308563,  0.02034035,  0.07405128,
            0.25196789, -0.03486212,  0.21168428, -0.15449296, -0.11058998,
           -0.02796998, -0.07348146, -0.09426926,  0.18688518, -0.28658732,
            0.12769751,  0.15962844,  0.10054937,  0.0230538 ,  0.21152038,
           -0.03815552,  0.00385181,  0.01635949,  0.02444802, -0.0144477 ,
           -0.00507272,  0.00141885, -0.00511901, -0.0331081 , -0.01035132,
           -0.03981033,  0.05873867, -0.04542492,  0.01465507,  0.01155537,
            0.03969387, -0.04650538, -0.12865868,  0.08653509,  0.08155747,
           -0.01284676,  0.07491223,  0.04722783,  0.05148136,  0.08652505])), (1.5823948341725016, array([-0.02579091, -0.10323643,  0.06233093, -0.15631637, -0.01142927,
            0.04732042, -0.07678036, -0.10237832,  0.11412126, -0.06317361,
           -0.05834357,  0.22674486, -0.12169029, -0.08095512, -0.09161384,
           -0.00748711, -0.06773762, -0.05158664,  0.0104614 , -0.03208893,
           -0.01127987, -0.07419635, -0.03121154,  0.04688368,  0.04296296,
            0.01776535, -0.01940218,  0.10655432,  0.007984  , -0.06009444,
           -0.08995488,  0.07290427, -0.04851162, -0.00111173, -0.03052541,
           -0.00506415, -0.04276741,  0.01211364, -0.06485362, -0.03264406,
           -0.01695773,  0.06618993, -0.01794634,  0.07672332,  0.18427271,
            0.00645225,  0.1060034 , -0.02057187, -0.0020446 ,  0.02221682,
            0.11947425, -0.01927545, -0.48625332,  0.48400686,  0.0765737 ,
           -0.07828707,  0.00855355,  0.04808957, -0.20662059,  0.08991486,
            0.13986082,  0.02992472,  0.01678794, -0.00525954,  0.14152616,
            0.17926009,  0.00538621, -0.05599342, -0.0716547 , -0.08164325,
            0.02650792, -0.0351676 ,  0.00165846,  0.09514562, -0.07842145,
            0.03097681, -0.03501136, -0.0624873 , -0.00370757,  0.11001634,
           -0.06511834,  0.00478484,  0.0232029 , -0.00802633, -0.05344581,
           -0.03140204,  0.02211096, -0.04893219, -0.14940765, -0.12861492])), (1.5166979298471606, array([ 0.03512953, -0.05998021,  0.10666813, -0.12151286, -0.0823675 ,
           -0.01772739, -0.04739566, -0.04986913,  0.017378  , -0.07612813,
           -0.06798112, -0.12944801,  0.22644149, -0.07273589,  0.06650317,
            0.02817224,  0.02324206,  0.03927625,  0.01264389,  0.0187637 ,
           -0.02279888, -0.08033275, -0.08179817,  0.02656226,  0.07572413,
           -0.03952647,  0.13790474, -0.1450146 ,  0.1294266 ,  0.00847142,
            0.01929373, -0.1169406 , -0.12399375,  0.02649847,  0.03841585,
           -0.01188958,  0.08773861, -0.01351911,  0.0249923 ,  0.01544068,
            0.01212473,  0.23575178,  0.08854544, -0.02308546, -0.08859317,
           -0.00819852,  0.41200847, -0.10110496, -0.11640363, -0.04605829,
            0.39971337, -0.05494921, -0.07489579, -0.00541504,  0.15577549,
           -0.12236071, -0.02983097, -0.15292934,  0.2874068 , -0.11035494,
           -0.10694081, -0.04685742,  0.03030548, -0.03453339, -0.11326956,
           -0.07205484, -0.02058402, -0.01707269, -0.05880667,  0.01668012,
            0.0440469 , -0.03878011, -0.08398554, -0.07418455, -0.02799173,
            0.10882041, -0.01572402,  0.13624518, -0.02676439, -0.05427701,
           -0.01529219, -0.05176724,  0.05367334, -0.00107446, -0.00264562,
            0.05796855,  0.11527288, -0.04911701, -0.15494206,  0.11725425])), (1.4840050960170488, array([-0.01647521, -0.07834627, -0.03795622,  0.0359174 ,  0.03689281,
           -0.00897316, -0.04864121, -0.07710089,  0.05563915,  0.38743286,
           -0.14582129, -0.01310922,  0.06974777, -0.01969408, -0.06385795,
           -0.04212032, -0.00445327, -0.02754135,  0.01184245, -0.0039486 ,
            0.02341597,  0.51007806, -0.16105521,  0.03972466,  0.0224396 ,
            0.0429155 ,  0.02903634, -0.02010313,  0.05042578,  0.00409658,
           -0.02356482,  0.00522301, -0.00311867, -0.02191703, -0.04233229,
           -0.01161743, -0.06236782, -0.0080576 , -0.02258844, -0.02122084,
           -0.0362343 , -0.02652322,  0.03201913, -0.00857236, -0.02097777,
           -0.01584929,  0.00656723,  0.01599246, -0.04447129, -0.00803429,
           -0.01061251,  0.10817242, -0.12478379,  0.08557665,  0.0460519 ,
            0.47181799,  0.16525984, -0.20280355,  0.17355343, -0.19177728,
            0.1237945 , -0.04323163, -0.03597109, -0.00419303,  0.08125443,
           -0.00628357,  0.04920573, -0.00522381,  0.17418589, -0.01709192,
            0.00150704,  0.01570768, -0.04528673,  0.04077768, -0.0136485 ,
            0.07800799,  0.04705179,  0.04622736, -0.02957357, -0.02562351,
           -0.04611269, -0.03345367,  0.0315392 , -0.03166596, -0.06843772,
            0.01994364,  0.02590641, -0.00907726,  0.03793356, -0.06581374])), (1.3921255385114626, array([-7.44798782e-02, -2.04057215e-02, -1.70213024e-01,  1.59969471e-01,
            1.16745291e-01,  1.22570505e-02, -8.60491578e-03, -3.35444803e-02,
           -1.63747860e-02, -8.81691725e-02,  2.29829604e-01, -1.91980144e-01,
            2.00935995e-01, -9.24163310e-02, -3.58916386e-01, -1.55938407e-01,
           -8.01419058e-02, -1.67543214e-01, -1.58590080e-02, -3.33998688e-02,
            1.42090660e-02, -1.01937834e-01,  3.97490694e-02,  3.11486801e-02,
           -3.28015525e-02,  1.03256895e-01, -1.58237285e-01,  7.35541952e-02,
            8.49211152e-02,  3.08494796e-02, -1.74544509e-01, -8.39287128e-03,
            2.78487996e-02, -9.45944424e-02, -1.31726300e-01, -4.56532786e-02,
           -2.77369029e-01, -1.65078423e-02, -1.47011482e-01, -1.20544364e-01,
           -1.24713634e-01, -2.12675713e-01, -1.59140924e-06, -1.70492395e-01,
           -4.71749996e-02, -2.64371816e-02,  1.93074103e-01,  5.10871246e-03,
           -4.51392455e-02, -1.12503289e-02,  1.79238763e-01, -3.64441742e-02,
            4.17966706e-02,  9.30520985e-03, -7.65341683e-02, -1.14372474e-01,
           -3.05317310e-02,  3.57304046e-03,  6.67812385e-02, -1.18929855e-01,
            1.15080118e-01, -2.72141011e-02, -6.35371089e-02,  1.76916211e-03,
            7.65631431e-02,  1.08834391e-02, -3.97408678e-02,  9.80381325e-03,
           -7.29636808e-02, -3.18072086e-02, -1.22301352e-01, -4.27613032e-02,
            5.05198151e-02,  2.49743560e-02,  5.62910547e-02,  8.95831493e-03,
            3.61783200e-02,  3.30025629e-02, -7.22496957e-02, -5.24225212e-02,
           -3.43834525e-02,  3.51711601e-02, -1.93835937e-02,  3.89598662e-02,
           -3.88440865e-02, -4.38094026e-03,  7.82669060e-02,  3.50383193e-02,
            1.99411248e-01, -1.66581180e-01])), (1.338123871891159, array([-0.00548775, -0.03843159, -0.03061153,  0.04333244,  0.13993639,
           -0.03706519, -0.02074415, -0.03732315,  0.03333833, -0.01952747,
            0.22987535, -0.21522097, -0.06384675,  0.08397588,  0.34627929,
           -0.04155743,  0.15962213,  0.17622303,  0.03179456, -0.00631348,
            0.01499474, -0.03142747,  0.07360389,  0.03405806,  0.05415617,
           -0.04012176, -0.21879165,  0.10111445, -0.06312339, -0.02052279,
            0.07695396, -0.03795474,  0.02675084,  0.09266905,  0.02364962,
            0.00294061,  0.32037628,  0.04085849,  0.22018367,  0.15760833,
           -0.07506903, -0.26341749,  0.045752  ,  0.046454  ,  0.0182172 ,
           -0.00876758,  0.02950111, -0.07382824, -0.09627891, -0.03991851,
            0.05040326,  0.00198497, -0.07373555,  0.24160928, -0.26190678,
           -0.01717238,  0.05946142, -0.07260429,  0.15436931, -0.14236139,
           -0.00075114, -0.02015509,  0.12668078, -0.02670258,  0.01202094,
            0.07757646, -0.01062797, -0.00859107, -0.03864637, -0.04924234,
           -0.12007782, -0.04770347,  0.01634402,  0.02039453,  0.00370991,
            0.06774831,  0.07424346,  0.01216043,  0.06891506, -0.01504565,
           -0.05261312,  0.05696704,  0.06306901, -0.0986121 , -0.09694006,
           -0.02580626,  0.00048615,  0.025849  ,  0.04712735, -0.18562838])), (1.2745588269563284, array([-1.35554753e-03, -6.07757998e-03,  1.94705449e-02, -3.64873032e-02,
            2.57177745e-02, -1.73766948e-03, -7.66767391e-03, -5.99583989e-03,
            1.28081101e-02,  3.13125071e-03,  7.72361809e-02, -9.08219621e-02,
           -3.02525331e-02,  1.04350856e-01, -1.07304336e-01,  5.79268206e-01,
           -1.92585215e-02, -1.43456269e-01,  1.44049771e-02,  5.54833874e-03,
           -3.02947376e-03,  1.53579737e-02, -1.61490155e-05,  1.64651561e-03,
            1.70440407e-01, -9.39241300e-02, -1.00536529e-01,  1.19542486e-01,
           -8.20623539e-02,  2.74726274e-02, -1.98189598e-02, -4.39066930e-02,
            5.74034124e-02,  1.96783609e-02,  2.65304936e-01,  1.24948072e-01,
           -2.00105962e-01,  4.30617383e-03, -4.64578459e-02, -2.17425881e-02,
            4.94733434e-01, -2.54029837e-02,  4.39087477e-02, -1.04778047e-01,
            2.20919184e-02, -5.46829160e-03,  8.31957137e-03,  1.71626353e-01,
           -3.94012165e-02, -9.76566932e-02,  2.29359231e-02, -4.01148302e-02,
           -2.68252725e-04,  1.11777672e-01, -1.67456052e-01,  3.17300476e-02,
           -1.64373986e-02, -9.39181037e-02,  1.05514889e-01, -2.96055848e-02,
           -5.97183383e-02,  7.01084344e-02, -5.40776784e-02, -2.26117483e-02,
           -2.87330811e-02,  6.69012887e-02, -3.01012039e-02, -1.84687406e-02,
            6.26595732e-04,  1.64181008e-02, -5.79387053e-02, -1.26763711e-02,
            1.79376732e-02, -4.92679658e-02,  9.23604464e-03,  1.07068444e-02,
            2.38723881e-02,  1.23594923e-02,  1.44704544e-02,  5.22613286e-03,
           -2.06610793e-02, -5.53429277e-02, -2.80537402e-02,  3.68670983e-02,
            7.42196857e-03, -9.91201404e-03, -2.55594189e-02, -2.38138885e-02,
           -2.70204929e-02,  1.48195145e-02])), (1.2235863310684403, array([ 0.02162233, -0.02507778,  0.00346684,  0.0337724 ,  0.06339719,
           -0.01546107, -0.00802168, -0.01952018,  0.01578492, -0.00374358,
           -0.0030274 ,  0.06604012, -0.29852924,  0.43570771, -0.01070749,
            0.00227028,  0.27598502, -0.19355632,  0.01200504,  0.00137478,
            0.01009588, -0.02943156,  0.04368344, -0.03455022, -0.09548836,
           -0.02263805, -0.0043047 , -0.14906378, -0.03932275,  0.210181  ,
            0.13741778,  0.03943821,  0.06420845,  0.01942264,  0.02616773,
           -0.00049999, -0.2467183 , -0.06123914,  0.25875485, -0.07995135,
           -0.04528393, -0.03659067,  0.03028853, -0.037632  , -0.06830891,
            0.03123251,  0.08586152, -0.08303141, -0.02826377, -0.10886284,
            0.07045707, -0.00895639, -0.05643817, -0.07005928,  0.21322098,
           -0.07408769,  0.15301434,  0.04851492, -0.03619714, -0.00610947,
            0.05533285, -0.0628817 , -0.20534149, -0.00580645, -0.02051046,
           -0.03196354,  0.04860021,  0.05638867, -0.03977628,  0.01021369,
            0.03052457, -0.01043534, -0.10746098, -0.01933172, -0.00225848,
           -0.02649758,  0.02510083, -0.01685237, -0.10001644, -0.0289755 ,
            0.01190971,  0.05171331,  0.29344975, -0.06493778, -0.10987834,
            0.09982846,  0.07016599, -0.05809229,  0.00350068, -0.11836791])), (1.2178118832338365, array([-1.82521939e-02,  3.98266974e-02,  4.37101951e-02, -1.09924289e-01,
            3.81072863e-02, -1.04404282e-02,  5.66479026e-02,  3.40861744e-02,
            3.19219948e-02, -2.82643180e-04,  4.60365840e-02, -1.36618844e-01,
            5.69508913e-02,  1.79428208e-01, -6.80695887e-02, -2.57879383e-01,
            4.86807903e-02, -1.58414203e-01,  1.91672236e-02,  1.04592425e-03,
           -1.56431618e-02,  3.50710634e-02, -7.20908512e-02,  2.81295729e-02,
            3.12644629e-01, -3.29212614e-01,  1.65765318e-01,  2.72942611e-01,
           -2.49384356e-01,  1.43915743e-01,  7.70159238e-02,  3.60081867e-02,
           -1.32977978e-02,  5.37143042e-02, -4.73669917e-02, -5.38248529e-02,
           -1.66768399e-01,  3.17515143e-02,  2.62159366e-02,  5.40694050e-02,
           -2.64704135e-01,  1.22058412e-01, -7.54748962e-02, -7.31124780e-02,
            2.54286662e-01,  1.81960202e-02, -7.81860561e-03, -5.30373138e-02,
           -2.94341276e-02, -2.58421362e-02, -3.85029912e-02, -4.36028096e-02,
            4.67960202e-02, -4.67573892e-03, -6.04959326e-02,  3.44279004e-02,
           -4.32294352e-02, -1.27615956e-01,  7.55820622e-02,  3.45839877e-02,
           -1.19639293e-01,  1.45885832e-01,  4.67547906e-03, -7.37539521e-03,
           -3.27779542e-02,  6.05062047e-02, -7.71285160e-02,  9.55809826e-04,
            4.73524121e-02,  3.80451458e-02,  6.64970709e-02,  1.19055043e-02,
            3.29736108e-02, -1.62921568e-01,  2.94258653e-02, -2.22492552e-02,
            2.72471490e-03, -4.42259233e-02,  9.79541765e-02,  8.86809373e-02,
            2.98735972e-02, -3.75493601e-02, -9.15437233e-02,  1.57466241e-01,
            3.52801176e-02, -1.31400982e-01,  6.26552634e-04, -4.00723013e-02,
           -1.13996015e-01, -5.55332480e-02])), (1.1943992057853123, array([-0.00931702, -0.00424866, -0.01216005,  0.004068  , -0.02263428,
            0.00471673,  0.03084057, -0.00579687,  0.01061565,  0.00286548,
           -0.05429538,  0.0424345 ,  0.17139521, -0.3992305 ,  0.06337345,
            0.03866668,  0.50312022, -0.17641882, -0.04637666, -0.05400847,
           -0.00154834,  0.00562841, -0.01500136,  0.01718973, -0.02036575,
            0.08496288, -0.03571607,  0.04040876,  0.05591341, -0.080265  ,
           -0.17623252, -0.02222803, -0.03419361,  0.13241458,  0.04712548,
           -0.03352416, -0.12270217, -0.00351138,  0.52701109, -0.04683177,
            0.03675955,  0.04148061, -0.02656703, -0.07510523,  0.05351867,
           -0.0249037 , -0.0318897 ,  0.02718563,  0.01596879,  0.05101118,
           -0.02903321,  0.06317588, -0.01326512, -0.03020909,  0.05275157,
            0.01706168, -0.05856816,  0.03988794, -0.05196665,  0.01620932,
           -0.04609711,  0.13712312,  0.03485711,  0.01156724,  0.03409038,
           -0.05809907, -0.03100509, -0.06844818,  0.00742539, -0.0287335 ,
           -0.01777249, -0.01164615,  0.05614855,  0.01500253, -0.0393923 ,
           -0.04277097,  0.09562431, -0.04900554, -0.06115623,  0.00210678,
           -0.00588897, -0.10569315, -0.05087745,  0.14847456,  0.06302812,
           -0.06454742, -0.01456505,  0.02526081,  0.02779199, -0.06739155])), (1.1835468203512876, array([ 0.03717678, -0.02535452,  0.02706796,  0.02121948, -0.00643199,
            0.06230364, -0.04062001, -0.01679781, -0.00330309,  0.05101256,
           -0.03300083,  0.02863528, -0.04126303,  0.08946389, -0.04040111,
           -0.05012179, -0.1012712 , -0.14580506, -0.01925575, -0.05337933,
            0.01891649, -0.00116377,  0.06500961,  0.03899285,  0.0213303 ,
            0.07741037, -0.17050636, -0.16609609,  0.07488031, -0.08878289,
           -0.1630476 ,  0.11557036,  0.18380201,  0.082032  ,  0.10960807,
            0.03167498, -0.10141503,  0.23255009, -0.11216596,  0.22622171,
           -0.11895868,  0.02902314,  0.07728901, -0.11490983, -0.10045019,
            0.00619202, -0.04623066, -0.13040619,  0.17791992,  0.00567379,
           -0.01654318,  0.03274446, -0.05062707,  0.01265202,  0.05737324,
           -0.0064413 ,  0.09038616, -0.01321635,  0.0524206 ,  0.02059814,
           -0.35873099,  0.27435224,  0.338374  ,  0.07888839, -0.11605764,
           -0.03570986,  0.01418727, -0.08400272, -0.02308667,  0.02757074,
           -0.02740659, -0.04492104, -0.04322222,  0.0325053 , -0.09134924,
            0.01900407, -0.05265527,  0.01393194,  0.1479187 , -0.02768956,
           -0.01705519, -0.19410776,  0.13552946,  0.05032767,  0.01248076,
            0.20909184, -0.08041327,  0.0217424 ,  0.02548379, -0.14547926])), (1.175215028494778, array([-0.01003806,  0.00929693, -0.00839073, -0.00384031, -0.02025381,
            0.01108479,  0.03082652,  0.00684872,  0.0040802 , -0.00835238,
           -0.00888267,  0.02343615,  0.00107126, -0.02599632, -0.01324817,
            0.02040201,  0.06855982, -0.1738684 , -0.02001343, -0.00324887,
           -0.00960466, -0.00413402, -0.00319067,  0.06760916, -0.03168774,
           -0.04264339,  0.04942483,  0.03072312,  0.02012888,  0.09466353,
           -0.06283886,  0.09247561, -0.16311953, -0.00957255, -0.18114619,
            0.01839285, -0.12161602,  0.06257645,  0.07093488,  0.41841154,
            0.09921985,  0.03727513, -0.0018813 , -0.13057926,  0.03516641,
            0.00174659,  0.02598584,  0.16304889,  0.07714149, -0.05721265,
            0.01428726, -0.04961955,  0.0233701 , -0.02559996,  0.0155587 ,
           -0.01602477, -0.01613317,  0.00661517, -0.02768675,  0.00053859,
            0.21964819, -0.41933767,  0.27741611,  0.0572856 ,  0.03932133,
           -0.01142053, -0.01319052,  0.03717639, -0.0075554 ,  0.00712815,
            0.03087199,  0.01612459, -0.04848844, -0.00117159,  0.03997727,
           -0.06363599,  0.04734225,  0.01954812,  0.45531874, -0.07439785,
            0.02489246,  0.03331119,  0.01755671, -0.22160939,  0.01271444,
            0.02131253,  0.0366985 ,  0.07920369, -0.01420712,  0.07780367])), (1.1607311253642982, array([ 2.87769241e-02, -8.58732180e-03,  5.11720150e-02, -3.63311597e-02,
           -1.14374568e-01,  4.54263578e-02, -5.78887157e-03, -2.58465735e-03,
           -1.40020678e-02,  2.94326379e-02, -6.44111057e-02,  4.26378371e-02,
           -4.62103817e-02,  1.10513405e-01, -6.83260112e-02, -1.32374633e-01,
            1.57587691e-01, -4.88596364e-02,  6.47743069e-02,  1.21486343e-01,
           -2.98135439e-02,  2.38420919e-02,  4.41077246e-02, -2.58264723e-03,
            1.19040856e-01,  7.37694514e-02, -2.99108580e-01, -1.43489044e-01,
            1.30655959e-01, -6.51230081e-02,  2.07728568e-01,  8.58640129e-03,
           -7.58933110e-02, -2.93781059e-02, -1.13635802e-01,  1.40724954e-02,
           -1.51357057e-01, -1.16784720e-03,  1.09990489e-01, -8.49939518e-02,
           -1.03762028e-01,  1.61494462e-01,  8.54602535e-02,  1.05684727e-02,
           -1.58935743e-01,  7.89426677e-02,  1.10384311e-02, -2.27832673e-02,
            1.53458821e-01,  5.05888760e-02,  3.94104789e-04, -9.09718920e-02,
            3.38157682e-02,  2.34985521e-01, -4.08935670e-01,  5.66997577e-02,
           -1.70489242e-01,  6.78470832e-03,  1.46005574e-02,  5.25839206e-02,
            5.42972822e-03, -6.47901863e-02, -1.28497758e-01,  4.30876884e-02,
           -4.98891316e-02,  9.35210472e-02,  4.30546194e-03,  2.88052594e-02,
            4.62027218e-02,  6.67473103e-02, -8.45943593e-02,  5.52658733e-02,
            5.18162141e-02, -2.87932257e-02,  2.95775228e-02,  7.21685520e-03,
           -1.11638155e-01,  9.76728264e-02, -1.98870673e-02, -1.41592603e-02,
            1.73892059e-03, -9.33084113e-02, -2.54801086e-01, -1.00897644e-01,
            1.03824118e-01,  2.08909237e-01, -1.09038150e-04, -2.96310066e-02,
           -3.01645687e-02,  2.05984863e-01])), (1.148470393300462, array([-3.61964599e-05,  2.65794249e-02,  3.60987797e-02, -6.30859519e-02,
            3.53506277e-02,  2.35960207e-02,  3.72210055e-02,  2.50778801e-02,
            2.32019288e-02, -5.01718666e-02,  8.58251787e-02, -6.55178810e-02,
            1.59312468e-02, -1.04240857e-02,  2.31768437e-02,  4.78816367e-02,
           -3.16944574e-02,  3.16321506e-02, -5.56196280e-02, -8.64990801e-02,
           -1.94608535e-02,  2.91957870e-02, -1.19483897e-01,  7.98492218e-02,
            2.69925041e-01,  2.97472967e-01, -1.46214798e-01, -3.88005261e-01,
            1.58821192e-02,  2.46261316e-01, -1.12748529e-01, -3.44188737e-02,
           -6.91164070e-02, -6.31484345e-02,  9.59373799e-03, -4.91259929e-02,
            1.29723837e-01,  4.52989851e-02, -1.12190427e-02, -6.85997785e-02,
            3.44038836e-02,  6.86329050e-02, -1.49008856e-01, -1.39422356e-01,
            2.51729095e-01,  6.54074575e-02, -3.38979458e-02, -3.67616247e-02,
            5.93783187e-02,  8.05105269e-02, -3.99044291e-02, -1.30776028e-02,
            2.26413898e-02, -2.68176282e-03, -3.03462048e-02,  3.52784116e-02,
           -2.99922388e-02, -6.90536366e-02, -2.80968997e-02,  7.02937675e-02,
            4.56930898e-02, -8.37216294e-02,  8.48566660e-02, -1.34127177e-02,
            1.41981688e-02,  9.92679438e-02, -3.34988956e-03, -2.94506250e-02,
            1.32539787e-01, -6.00787755e-02, -2.51151415e-02,  9.89540495e-03,
           -9.42027640e-02, -2.66558520e-01,  4.05104637e-02, -1.54805616e-01,
           -3.76218661e-02, -1.74510560e-01, -1.30386733e-01, -1.52019447e-01,
            1.40425150e-01,  2.42306623e-01,  8.44530920e-02,  6.53919613e-02,
            6.36447079e-02,  5.02101913e-02,  8.80789722e-03,  6.75093163e-03,
           -9.41898938e-02, -1.37811683e-01])), (1.1194893763589782, array([ 9.75212477e-03, -2.14980571e-02,  5.02994180e-04,  1.70836411e-02,
            4.84416567e-02,  5.29926893e-03, -1.76297741e-02, -1.84186047e-02,
           -7.34638265e-04, -9.65676047e-02,  1.62105007e-01, -6.97746038e-02,
           -9.07416941e-02,  1.00229032e-01,  7.87474923e-02, -1.30837342e-03,
            3.72779209e-03, -1.66772165e-01,  5.80607165e-02, -5.92666217e-02,
           -5.06231372e-02,  5.00420706e-02, -5.76942041e-02, -2.33076116e-02,
           -2.28977381e-01,  4.53010866e-02,  4.30024819e-01, -1.87498337e-01,
            2.95267449e-02, -5.09800705e-02, -1.76023158e-01, -4.50249163e-04,
            9.18409746e-02,  5.71992742e-02,  8.82050093e-02,  4.67645473e-02,
           -5.21500033e-02,  4.66011022e-02,  4.25656486e-02,  8.54790914e-02,
           -2.34339574e-02, -6.76744639e-02,  2.23879271e-02, -7.27912296e-02,
           -6.87466408e-02, -2.27698963e-02,  9.92739113e-03, -4.40896135e-02,
            2.10350772e-02, -2.74350657e-02,  1.63940431e-02,  3.26246694e-02,
           -4.88825022e-02,  1.08413925e-01, -9.60192428e-02,  1.04039975e-01,
           -2.07286354e-01,  7.28417267e-02,  2.54386057e-03,  1.68173847e-03,
           -6.82465550e-02,  9.38517009e-02, -2.04653192e-02, -1.25056428e-02,
           -1.79577705e-02,  3.59411927e-02, -3.10549696e-02, -1.86131538e-02,
            1.11532990e-01, -2.17296757e-02,  6.31582889e-02,  3.33960948e-02,
           -8.06849400e-03,  4.77517866e-03, -1.55688143e-01,  1.15610590e-01,
           -4.39679617e-03,  8.33406560e-02,  1.63903924e-02, -9.29607729e-03,
           -5.89541465e-03,  4.54125479e-01, -3.76626344e-01, -1.84220010e-02,
           -3.05357641e-02,  2.68322063e-02,  8.90767292e-03, -9.60404650e-02,
            3.64751766e-03,  1.39357795e-02])), (1.1096027631760017, array([-0.03055394,  0.02006923, -0.01184309, -0.03560009,  0.03543938,
           -0.03474633,  0.07979204,  0.01308007,  0.02382763, -0.12809165,
            0.17073957, -0.1205178 ,  0.0791998 , -0.06903869, -0.05864014,
           -0.03898209,  0.02089783, -0.09830454,  0.06454565,  0.13005107,
            0.01883959, -0.01667412, -0.14323759, -0.10465112,  0.15049143,
            0.20588045,  0.09559427, -0.24177582,  0.01675801, -0.32678463,
            0.22266998,  0.06855058,  0.06707538,  0.11026669, -0.11583863,
            0.03869523, -0.15933217, -0.04541662,  0.01445586,  0.07800317,
            0.05061759, -0.20758707, -0.02736484,  0.1909889 ,  0.09907509,
            0.02327433, -0.02997833,  0.13052392, -0.08841585,  0.1226507 ,
           -0.08356291, -0.00260784,  0.00180205, -0.00090463, -0.00061247,
           -0.03144258,  0.13068261, -0.09081928,  0.01904294,  0.01296567,
           -0.05249401,  0.10687873, -0.03820156, -0.00198812, -0.00053131,
            0.01615025,  0.15613067, -0.08968868,  0.05123437, -0.08701044,
            0.00300159,  0.00884269, -0.14801765,  0.05676924,  0.00349479,
           -0.0093203 , -0.16312161,  0.01851605,  0.13961157,  0.22166984,
            0.0680938 , -0.00547623,  0.20430917, -0.07466997,  0.00161567,
           -0.15138955,  0.17433659, -0.09265044, -0.03864607,  0.16236077])), (1.1049386135422454, array([ 0.00390238,  0.01124098, -0.00319226,  0.01276039,  0.02215105,
            0.00251291,  0.03190408,  0.01135741, -0.01215839, -0.02391252,
            0.06608213, -0.03203576,  0.01005787,  0.00093562, -0.1271413 ,
            0.0043077 ,  0.09066048,  0.13911112, -0.02572941, -0.06674581,
            0.01267361,  0.00205398, -0.00689407, -0.01659854,  0.21426127,
           -0.25949547,  0.18789231, -0.04147826, -0.04967394,  0.02929571,
           -0.23312165,  0.31962601,  0.08978283, -0.06819931, -0.1871029 ,
            0.05366093,  0.13418003,  0.11565465,  0.089745  , -0.28088451,
            0.09355484, -0.05455608,  0.06352312, -0.00994818, -0.06054641,
           -0.07468322, -0.00445091,  0.10529331,  0.15791134,  0.12185223,
           -0.03846646, -0.07495017,  0.00653686,  0.05899556, -0.08369328,
           -0.03960324,  0.072001  , -0.02163773,  0.03886216, -0.00853727,
           -0.05292571, -0.05905547,  0.13561911,  0.25990598, -0.04319963,
           -0.08102355, -0.00085609, -0.09659159, -0.01459406,  0.01992934,
            0.19136588, -0.0134315 ,  0.02139756,  0.10682446, -0.01656247,
           -0.02167538,  0.13389442, -0.00706919, -0.22614349,  0.07601096,
            0.04167494, -0.00923226,  0.0642536 , -0.25490954, -0.04759977,
            0.16218892,  0.13475305, -0.05883087,  0.01153269,  0.06958206])), (1.0914342813543214, array([ 0.02189241, -0.03443139,  0.00945862,  0.02380971, -0.00049311,
            0.01258217, -0.0174738 , -0.02829601,  0.00979614,  0.1627378 ,
           -0.1235408 ,  0.07101763,  0.00283509, -0.03326187, -0.01674985,
           -0.00679414,  0.01288757, -0.04424785,  0.05538182,  0.05505606,
            0.12584791, -0.07862583,  0.04550909,  0.13851923, -0.17950897,
           -0.06372055,  0.19302631, -0.06672413, -0.02183436,  0.1176496 ,
            0.01215119, -0.07630287,  0.10243681, -0.08379255, -0.05316322,
           -0.04903896, -0.01993359, -0.05983417, -0.00481602,  0.04853408,
            0.04414344,  0.04041471, -0.00319666, -0.11173285,  0.00396367,
           -0.04073454, -0.01805713,  0.03039337, -0.19989023,  0.36637598,
           -0.0792469 ,  0.14439615, -0.00384276,  0.17040613, -0.31548541,
           -0.14895144,  0.25802145, -0.08622959,  0.01933251, -0.00363105,
           -0.0267289 ,  0.03013368,  0.01289024, -0.10870153, -0.00955271,
           -0.16005998, -0.01934533, -0.00140748, -0.13589127, -0.0481737 ,
           -0.00224047, -0.14533548,  0.28976179, -0.17671099, -0.07014446,
           -0.14335317, -0.15035157,  0.02526417,  0.04124523, -0.07163569,
           -0.00110598,  0.09289375,  0.14714881,  0.11619341,  0.01358176,
            0.0716517 ,  0.10363371,  0.04587821,  0.01845485,  0.09531802])), (1.0846048546102214, array([ 1.36416745e-02, -3.76527152e-02,  5.28365265e-02, -6.71135070e-02,
           -3.35727071e-02, -5.32886764e-03, -5.07830935e-02, -3.29187958e-02,
            1.40798118e-02, -6.18918016e-02, -3.69930704e-02, -4.26516643e-03,
            3.49837879e-02,  6.77725364e-02, -7.07306194e-02, -4.57095020e-02,
           -4.05979345e-03,  1.08793143e-02, -1.85499966e-01, -5.90737535e-02,
           -9.00830552e-02,  9.51727326e-03, -2.42416783e-03, -1.30326030e-01,
            8.31577790e-02,  1.58379027e-01, -1.54707423e-01, -3.69055337e-02,
           -1.34675257e-01,  2.16944049e-01, -1.67535471e-01,  1.42152860e-04,
            1.40949060e-01,  1.29576748e-01, -5.52087481e-02, -4.38598682e-02,
           -2.11872636e-02, -1.64992513e-01, -5.00899381e-02,  1.04220238e-01,
           -1.85693009e-02,  1.69091507e-01, -6.76802710e-03,  2.43984780e-01,
           -1.34055526e-01, -9.39195572e-02,  5.67596962e-02,  7.37155245e-02,
           -2.13678509e-01, -2.10050985e-02,  4.96224151e-02, -4.66060006e-02,
            1.94950335e-02,  6.18430541e-02, -1.17830163e-01,  9.37997796e-02,
           -1.38472573e-01,  2.39545267e-02,  1.11274762e-02,  2.76187965e-02,
           -6.56220134e-02,  5.16874758e-02, -2.63416474e-02, -1.08027231e-01,
           -4.05131629e-02, -3.64657569e-01, -1.49822854e-03, -2.17486451e-02,
            9.54450737e-02, -1.93217907e-03, -4.04227011e-02,  9.06092703e-02,
            1.76503982e-01,  2.64354242e-01, -4.30849875e-03, -1.45214699e-01,
            2.28849015e-01, -1.66326592e-01,  1.17558268e-01,  1.23194184e-01,
            8.76691304e-03,  1.37073531e-01,  9.53029811e-02, -3.55104498e-02,
           -7.08018442e-02, -3.27677377e-02, -4.75275678e-02, -5.51551190e-02,
           -3.94044368e-02,  5.05800808e-02])), (1.0827345320601791, array([ 0.02608929,  0.01371157, -0.01316454,  0.07103393,  0.06334471,
           -0.01052515,  0.05838226,  0.01794481,  0.01605202, -0.08722451,
           -0.00212057,  0.01563993, -0.040189  ,  0.06106663,  0.04471129,
            0.03679496, -0.03609148,  0.05505417, -0.01942624, -0.074719  ,
           -0.07492881,  0.02029782,  0.01929931, -0.07387488,  0.10632031,
           -0.12726389, -0.06496204, -0.16477549,  0.12185701,  0.03147709,
            0.0849513 ,  0.27178893, -0.19333652, -0.1181441 ,  0.08843202,
           -0.09415497,  0.04062823, -0.18361823, -0.01325183,  0.13323152,
            0.04412477, -0.00660519,  0.01855727, -0.07165366,  0.03654398,
           -0.01583086, -0.00077891,  0.09646896, -0.17793183,  0.19398858,
           -0.06899618,  0.14358455, -0.08394391,  0.00725759,  0.09285048,
            0.04445943, -0.15460279,  0.13528103, -0.03727375, -0.00432682,
           -0.00749074,  0.00194684, -0.09744172,  0.22268596, -0.01705897,
           -0.2809283 , -0.02343944, -0.01155507,  0.04431471,  0.02610905,
           -0.09651504, -0.00723791,  0.11274514, -0.06014666,  0.07767799,
            0.31518473, -0.08841039, -0.00263312,  0.09328507,  0.02644484,
           -0.05111885, -0.13368294, -0.13157751,  0.13679515,  0.05311068,
            0.01467942,  0.2654361 , -0.05453342,  0.05882195, -0.26311322])), (1.0711889349293784, array([ 7.35032739e-03, -6.05294219e-03, -3.18961523e-02,  6.92206083e-02,
            2.11146054e-02, -1.80449277e-02,  1.73558765e-02, -4.30268813e-03,
            8.40429472e-03, -5.06717369e-02, -3.20152398e-02,  8.37848800e-02,
           -6.93265601e-02,  6.89973963e-02, -1.43561936e-01, -7.60267861e-05,
            1.08240437e-01,  9.97229941e-02,  6.25795502e-02, -2.68711516e-02,
           -8.95563476e-02,  3.11804891e-02,  2.92291847e-03,  6.99087376e-02,
            1.72306788e-01, -1.70191104e-01,  4.92745375e-02, -1.16659803e-01,
            8.01172970e-02,  2.19604626e-02,  4.57797290e-03, -3.18535647e-01,
            2.83039200e-01, -4.42865328e-02, -2.14172005e-01,  8.65819959e-02,
            4.59897609e-02,  1.67839804e-01,  4.74444353e-02, -2.04096693e-03,
            4.45972907e-02, -1.84784859e-02,  3.67002221e-02, -3.89093533e-02,
           -5.44284701e-02, -7.35226162e-02, -1.79040400e-02,  1.03478098e-01,
           -2.36485139e-01,  1.56970299e-01,  8.61790027e-04, -7.76936897e-02,
           -9.22260724e-03, -2.88677542e-02,  8.94937864e-02,  7.70136302e-02,
           -2.26364718e-01,  1.80732695e-01, -7.33757398e-02,  3.04025298e-03,
           -1.29459403e-02, -9.29285890e-03,  1.67078601e-01, -3.92809793e-01,
            3.15256265e-03,  1.38861430e-01, -8.32651408e-02, -4.50246873e-02,
            7.95186158e-02, -6.99649803e-03,  4.56267492e-02,  1.02014106e-01,
           -5.46238406e-02,  8.18037302e-02,  8.04374447e-02,  1.31697986e-01,
           -1.52482944e-01, -2.38383139e-02,  1.89103034e-03, -1.06697907e-01,
           -8.98353059e-02, -1.41209456e-01,  2.83590603e-02, -6.28238278e-03,
           -6.59767662e-03, -4.96758182e-02,  6.35906472e-02,  6.19351502e-02,
            6.61072339e-02, -6.05161345e-02])), (1.0636889264741323, array([-1.78741662e-02, -3.35305864e-03, -4.63080385e-02,  4.79209043e-02,
           -4.41204495e-02,  2.51674495e-02,  8.04611723e-03, -6.59300723e-03,
           -2.84730843e-02,  2.41629079e-02,  3.87817790e-02,  6.71582876e-02,
           -5.03348601e-02, -1.62125338e-01,  1.74122016e-01, -5.42585379e-02,
           -8.18383340e-02, -3.52185647e-02, -3.77420648e-02, -5.34238699e-02,
            1.17142345e-01,  1.94547224e-02, -7.08832856e-02, -1.27047112e-01,
            3.39755313e-01, -1.35466949e-01,  1.04595107e-02, -1.30035681e-02,
            7.39993858e-02, -5.81725018e-02, -1.44061875e-01, -1.99191906e-01,
            1.79286967e-01, -1.21190591e-01,  8.01792571e-02, -7.64494994e-02,
            1.66262543e-02, -2.39493290e-01,  3.83750787e-02,  1.83870406e-01,
           -7.47679673e-02, -8.07918404e-02,  6.24524946e-02, -3.99129133e-02,
           -1.42741438e-01, -7.21310505e-03,  4.96211043e-03,  8.48621898e-02,
            2.15515680e-01, -8.26625518e-02,  2.15747714e-02, -4.48358208e-02,
           -2.04433750e-04, -1.59766103e-02,  3.70072966e-02, -6.60417365e-02,
            1.58815573e-01, -3.26744578e-02, -5.40505568e-02,  2.52737200e-02,
            7.66776346e-02,  2.97530563e-02, -2.57170177e-01, -1.02254792e-01,
            3.18499749e-02,  6.61156058e-02,  4.39845954e-03, -4.25431053e-02,
           -1.01418002e-01, -5.75079706e-02,  1.35511675e-01, -6.17755409e-02,
            3.27842198e-02, -1.01334015e-02,  1.06259242e-01,  3.39699852e-02,
            1.81752213e-02, -2.13502123e-01,  1.44220602e-01, -1.17824890e-01,
           -2.69799529e-02,  3.04793508e-02, -4.85665970e-02,  5.10212773e-02,
            1.21739561e-03,  2.39552260e-01,  3.37575025e-02, -2.97237033e-01,
            7.24032687e-02,  6.35448512e-02])), (1.054552899346843, array([ 0.00189664, -0.02418795,  0.01666143, -0.02559334, -0.04723553,
            0.08762733, -0.01844839, -0.02246402, -0.00971444, -0.00164947,
            0.06009845,  0.05435715, -0.11430688,  0.05245795, -0.07366107,
           -0.09214164,  0.03386417,  0.17952946, -0.00685296, -0.06151238,
           -0.01797202,  0.02444293,  0.01174569, -0.09673668, -0.0569378 ,
            0.06625694, -0.09712417,  0.08872945,  0.04657131, -0.03924046,
           -0.08835877,  0.01618904, -0.02475417,  0.00304524, -0.16224386,
           -0.05104083,  0.13126887, -0.09246795,  0.0206345 ,  0.02608835,
            0.03934495,  0.01252745,  0.08147797, -0.24957468, -0.0223957 ,
           -0.07817746, -0.02509523,  0.3154523 , -0.13966946,  0.06831692,
           -0.02458838,  0.00925475, -0.05591175, -0.02468803,  0.13381471,
            0.02909932, -0.00545406, -0.03017539,  0.02461014,  0.02148193,
           -0.09844483,  0.14146013, -0.10725505, -0.04139617, -0.03799853,
            0.10957219, -0.02784257, -0.01238004, -0.05261667,  0.07543871,
           -0.08444205, -0.0357948 , -0.23817986, -0.24027782,  0.01643644,
           -0.32423113,  0.18525905,  0.38181714,  0.10277854,  0.21109554,
           -0.12364024,  0.06989283, -0.04347523,  0.05079479,  0.03984204,
            0.09982685, -0.03860539, -0.12303043, -0.03791448, -0.01685344])), (1.0495564454254236, array([ 0.01064236,  0.01687834,  0.00774352,  0.00608319, -0.0663222 ,
            0.02874585, -0.01472903,  0.01797059, -0.01661496,  0.00404952,
           -0.01793021,  0.02676529, -0.03435304,  0.06351985, -0.08183986,
           -0.05729709,  0.11622493,  0.1527865 ,  0.00436274, -0.09122086,
            0.05417958, -0.04566393, -0.01298813, -0.05333558,  0.05338584,
            0.00566828,  0.08695468, -0.00570582,  0.01474224, -0.05766692,
           -0.1724336 ,  0.14035376, -0.03804171, -0.15738222, -0.08266736,
            0.04978232,  0.05402185,  0.01735426,  0.1069416 ,  0.0049564 ,
            0.01182338,  0.08756753,  0.00285441, -0.00483389, -0.06538106,
           -0.19215988,  0.02330812,  0.16395316, -0.11995921, -0.30957911,
            0.08650203, -0.02370349,  0.04727246, -0.01419063, -0.05444469,
            0.00958458,  0.10879339, -0.09002233,  0.0206424 ,  0.0085553 ,
           -0.01636564,  0.03688603, -0.02780243,  0.21259285,  0.00504157,
            0.10054532,  0.06241784,  0.10927469,  0.06292144,  0.0116295 ,
           -0.23382899,  0.05088776,  0.06485522,  0.10607442, -0.03182513,
            0.03136658, -0.51409216, -0.09419313,  0.05395911, -0.09983506,
           -0.09806359,  0.12642438,  0.1162335 ,  0.16582754,  0.05573822,
           -0.12724108, -0.25410565, -0.01573826,  0.01954463,  0.0113436 ])), (1.048150719936755, array([ 6.32926518e-03,  1.93752966e-02, -6.53840904e-04,  1.27988606e-02,
           -2.23016780e-02,  2.23062907e-02, -1.25351527e-02,  1.94998288e-02,
           -1.66564719e-02,  1.12505578e-01, -2.63086807e-02,  1.07711355e-02,
           -2.42326174e-02,  4.17755806e-02, -2.74605434e-02,  1.52271476e-03,
            3.39426545e-02, -5.25063210e-02, -2.08645711e-01, -9.29383028e-02,
            1.55201829e-01, -1.05959186e-02,  1.46360546e-02,  7.17564931e-02,
           -2.21967333e-03,  3.61857593e-03, -9.09097678e-02,  8.68340523e-02,
            5.21987840e-02, -1.96675177e-02, -1.77688729e-01, -6.89211306e-02,
           -8.28044588e-03,  2.71875449e-01, -5.61288151e-02,  9.82268066e-02,
            1.41919011e-02, -9.46419057e-02, -2.21997555e-02, -6.73477724e-02,
            2.91777474e-04,  3.23841883e-02, -7.05894469e-03, -2.66193621e-02,
            3.08786946e-02, -1.17226362e-01, -3.41913073e-02, -6.49851301e-02,
           -3.64756074e-02,  1.53302816e-01, -2.73293681e-02, -9.04331988e-02,
            3.10836892e-02, -5.49365525e-03, -1.92861272e-02, -7.27238605e-02,
            1.20203541e-01, -5.49052633e-02, -8.10101525e-03,  2.82562220e-02,
            2.27497961e-02, -4.56225291e-02, -4.50583496e-03, -8.53459705e-02,
           -5.91057225e-03, -1.42297715e-01,  1.29745173e-02,  9.10621502e-02,
           -1.10595308e-01,  1.60057000e-01, -6.90532574e-02, -2.61851164e-02,
           -3.13303654e-01, -1.87659790e-01,  8.05732976e-02,  4.47115037e-01,
           -1.32055824e-01, -1.90136154e-01, -8.59790525e-03,  3.21439334e-01,
            1.76632464e-02,  1.51021356e-01, -9.00769966e-02, -1.30557213e-01,
            6.19954135e-02,  4.48562221e-02, -8.80594436e-02,  1.19985457e-01,
            7.29911054e-03,  5.90922916e-02])), (1.0416363339168457, array([ 1.46623820e-02,  1.54303991e-02,  1.79455242e-02, -4.32272660e-03,
            2.35530131e-02, -9.42976306e-03,  2.03551780e-02,  1.73751586e-02,
           -2.37510925e-04, -4.17669585e-02, -2.87456320e-02,  1.30099513e-02,
            6.80922864e-03, -8.69418294e-04,  5.81911122e-02,  6.02727274e-03,
           -3.76176668e-02, -2.61469369e-02, -1.77654206e-02, -1.02570906e-01,
           -1.40732267e-01, -2.30153512e-02, -7.99214816e-03, -8.46987746e-02,
            6.93018879e-02, -6.99778433e-02,  4.52420607e-02,  3.34853621e-02,
            2.43491611e-02, -1.19113095e-01, -7.75098025e-03, -1.21760579e-01,
            1.14685497e-01,  2.25428210e-01, -1.10637738e-01,  3.40121251e-02,
            1.57074247e-03, -8.49852459e-03, -3.95465819e-02, -5.60547606e-02,
            5.59753541e-03, -1.02641664e-02, -1.56158744e-03,  8.08025724e-03,
            3.88694509e-02,  5.18183511e-02,  2.87509168e-02,  4.34750839e-02,
            1.31356215e-01, -1.76073092e-01,  6.86705263e-02,  4.02618874e-01,
           -2.79742194e-02, -8.14257034e-02,  5.59803184e-02,  4.68295630e-02,
           -7.85045448e-02,  1.19823169e-02,  1.26926437e-02, -2.64786421e-02,
            9.56324796e-02, -1.10901569e-01, -1.39180410e-02, -6.63077191e-02,
            2.49854692e-02, -2.51737936e-02, -2.19902848e-01, -6.11794604e-02,
            5.36039258e-02,  6.68968645e-02, -2.45041001e-01, -8.18407346e-02,
            3.21649268e-01, -1.64446235e-01, -1.29230020e-01, -8.38560239e-02,
           -2.05545607e-01,  5.81414621e-02, -5.98264940e-02,  2.33343621e-01,
            2.48402369e-01, -3.12717029e-02,  9.50193588e-02, -2.30041984e-01,
            7.45476523e-02,  1.64200297e-01, -1.60027315e-02, -3.97527715e-02,
           -3.80590486e-04, -1.18257727e-01])), (1.0381369601703594, array([-0.01452019, -0.01310388, -0.00336089, -0.02087642, -0.02197353,
            0.05608255,  0.01455196, -0.01515219,  0.00758196,  0.02941782,
            0.06903329,  0.01999798, -0.07352511, -0.00069396, -0.0552507 ,
            0.0230458 ,  0.07026268,  0.06683153,  0.01384287, -0.12941273,
            0.07686492,  0.00546715,  0.0082493 , -0.00252667,  0.01492917,
           -0.00691446, -0.04071411, -0.01726711, -0.01130595,  0.22201318,
           -0.01064372, -0.05323656, -0.16254632,  0.01430831, -0.04516928,
            0.07372426, -0.0349304 ,  0.102635  ,  0.01170177,  0.0255023 ,
            0.04308332,  0.00141148,  0.02590048,  0.21401737, -0.1421845 ,
            0.12882908, -0.00345029,  0.02566968,  0.01811789,  0.0097195 ,
           -0.02130792,  0.14354843, -0.0388769 , -0.03878133,  0.08394915,
           -0.05331129,  0.11921897, -0.14265847,  0.07302256,  0.0005525 ,
           -0.05354975,  0.0696055 ,  0.02237221, -0.09216676, -0.01143389,
            0.01236562,  0.02386611, -0.04510227, -0.08527455, -0.04482601,
            0.10257254, -0.30206641, -0.05343934,  0.31309607,  0.32663363,
           -0.14205531, -0.21014011,  0.1071922 ,  0.02449223,  0.06901395,
            0.31163194,  0.06347318, -0.34719675,  0.06869492,  0.02253956,
           -0.01624024,  0.08994234,  0.10084055, -0.0057217 , -0.14646712])), (1.0334519547258465, array([ 5.77656009e-04, -6.14921415e-03,  1.52996270e-02, -2.56453126e-02,
            1.72353854e-02,  3.11267022e-02, -3.45374061e-02, -5.69263252e-03,
           -6.27046938e-03,  1.14765169e-02,  8.08286924e-03,  1.94396853e-02,
            2.26049114e-03, -9.07174977e-03, -1.92518361e-01, -4.57572252e-02,
            1.19314884e-01,  2.15268187e-01,  2.73962655e-03,  1.88035665e-01,
           -1.82317506e-02,  1.35226005e-02,  3.52944128e-02,  3.01395938e-02,
           -1.35578444e-01,  2.14035930e-02,  2.08187780e-02, -8.50202768e-03,
           -4.90208524e-02,  7.78140518e-02,  1.57332492e-01, -4.31036636e-02,
            4.91241884e-02, -6.15175316e-02, -1.73908341e-01, -2.43889746e-01,
           -2.10275312e-04,  1.67260936e-01,  8.27515080e-03, -3.85271443e-02,
            1.40574031e-01, -6.91170120e-02, -3.03533770e-02,  4.36044513e-02,
            6.53480795e-03,  3.48562972e-01,  3.81207646e-02,  2.12294650e-01,
            1.37878922e-01, -2.19623701e-01,  4.09898494e-02, -1.31708774e-02,
           -1.78900973e-02,  2.42607087e-03,  3.14801899e-02,  3.44814079e-02,
            2.59723498e-03, -2.00335054e-02,  3.68773565e-03,  1.75328683e-02,
           -6.44631385e-02,  6.46585338e-02,  5.75447016e-02, -1.56181005e-02,
           -1.47293524e-02, -1.61766587e-01,  3.15482798e-03, -2.36391663e-01,
           -7.59884246e-02, -1.98443256e-01, -4.50284225e-02, -1.03828833e-01,
            3.53474695e-02, -1.75930249e-01, -2.24283935e-02,  3.15305407e-01,
            1.11137397e-01, -1.87891103e-01,  1.00704819e-01,  6.24550923e-02,
           -3.41999807e-02,  1.13475557e-01, -2.60730071e-02,  1.43323613e-01,
           -1.56250938e-03,  1.02246588e-01, -1.27730473e-01, -3.73056009e-02,
           -2.51357486e-02,  2.17889221e-02])), (1.027681645874375, array([-0.01310177,  0.01012866, -0.03677743,  0.04007282,  0.02208434,
           -0.0266942 , -0.05120676,  0.00704602, -0.03124247,  0.12697724,
           -0.06647692, -0.00186713, -0.01114223,  0.08695685, -0.03312939,
           -0.03077175,  0.0148754 ,  0.02452668,  0.01383393,  0.00964574,
            0.28695189, -0.07626678, -0.06120312,  0.2266083 ,  0.12077144,
            0.00223681,  0.03746042, -0.23434108,  0.07633301,  0.07424533,
           -0.00093397,  0.04150766, -0.15909224,  0.23787766, -0.03534287,
            0.0836789 ,  0.00457522,  0.07283674, -0.06969312, -0.02903364,
           -0.01943321, -0.09437936, -0.01090542,  0.01431064,  0.00078577,
           -0.06842964, -0.00583048,  0.09406928, -0.11215129, -0.05293754,
            0.04177536,  0.10423728,  0.01473155, -0.02739377, -0.01496355,
           -0.05298254,  0.00411919,  0.11092835, -0.08741769,  0.01822837,
            0.01908275, -0.02756407,  0.0354024 , -0.02845482,  0.00891159,
            0.1017309 ,  0.1356951 , -0.05839743, -0.1241226 ,  0.19573594,
           -0.0861197 , -0.07853395,  0.15779429,  0.01583231, -0.08672095,
            0.0854994 ,  0.26404478,  0.0554805 ,  0.0046045 , -0.09116275,
            0.14921383, -0.13585461, -0.09655757,  0.02634402, -0.0284895 ,
           -0.23497157, -0.21359727, -0.42175165,  0.02789145,  0.04223427])), (1.0238189302225627, array([-3.34466103e-03,  6.38920502e-03, -3.61879641e-02,  5.70146856e-02,
            2.37921035e-02,  1.36315975e-02, -3.13055516e-02,  5.38838099e-03,
           -2.36030724e-02,  2.13974787e-02,  3.25216864e-02,  2.21149601e-02,
           -5.06359500e-02, -1.44759871e-04, -6.38409007e-03,  3.93762100e-02,
            2.92729685e-02,  4.69509273e-03, -1.87429134e-01, -9.71013168e-02,
            3.43571003e-02,  5.75364424e-02, -3.26306519e-02,  1.28664041e-01,
            7.77140198e-02, -4.93662232e-02,  6.92259270e-03, -4.87507395e-02,
            9.57992859e-02, -3.19971853e-02, -5.98961019e-03, -6.69245285e-02,
           -1.58410107e-01,  1.68967514e-01,  9.27890488e-02,  4.63448002e-02,
           -1.19614283e-02,  9.31119485e-02, -2.89798864e-02, -1.81792535e-02,
           -2.19585018e-02, -7.84419532e-02, -2.71152907e-02,  6.45713023e-02,
           -4.00682871e-02,  1.38066375e-01,  6.14912115e-03, -7.93693725e-02,
            1.36834518e-01,  4.22202882e-03,  1.75654360e-02, -2.64308583e-01,
            1.18320984e-02,  1.30152840e-02,  4.13198580e-02,  2.01704456e-02,
           -5.94070552e-02,  6.11268842e-02, -2.01341437e-02, -3.41170818e-02,
            3.44108864e-02, -1.73302164e-03,  2.84298457e-02,  3.41668876e-03,
            3.49527597e-02, -1.89583555e-01,  6.14688462e-02,  9.06845883e-03,
            4.80229754e-02, -1.62294049e-01,  2.15218250e-01,  3.15533805e-02,
            3.14501896e-01, -1.74348423e-01,  2.61057759e-01, -1.32057252e-01,
           -2.01552514e-01,  2.76094879e-01, -3.33620929e-03,  1.74634167e-01,
           -3.68153250e-01,  9.10455881e-02,  1.19570172e-01, -1.59658147e-02,
           -1.08267500e-02, -1.04737207e-01, -1.22950130e-01, -1.03392515e-01,
            5.15375610e-02, -3.83322394e-02])), (1.015216974820168, array([-0.01618276, -0.02360014, -0.02926011,  0.02127483,  0.00887336,
           -0.01031012, -0.03890124, -0.02537713, -0.00709545,  0.0183462 ,
            0.02564321,  0.03171079, -0.02552684, -0.05290231, -0.05858041,
            0.01397781,  0.04483173,  0.10630102, -0.09710735,  0.21751889,
           -0.16086921,  0.03953656,  0.01445865,  0.05711042, -0.05360371,
            0.04744034, -0.10515641,  0.06592877, -0.06065161,  0.19837085,
           -0.08167092,  0.001723  , -0.05500338,  0.0178873 , -0.02935533,
            0.23903312,  0.05112242, -0.20404582, -0.00608127,  0.05924824,
           -0.03380718, -0.13369095,  0.09619283, -0.16325832, -0.08844357,
            0.05882671,  0.00166718, -0.12655043,  0.08310361,  0.10773846,
           -0.0047094 , -0.0187615 , -0.0257083 ,  0.03410192, -0.00164383,
            0.06775397, -0.13520662,  0.03043049,  0.01271346, -0.02116568,
           -0.00908414,  0.07359619, -0.09341584,  0.15139447,  0.01570467,
            0.03237653, -0.36783292, -0.1593625 , -0.00612681, -0.02159892,
            0.23910808, -0.0372705 , -0.09220093,  0.02165323, -0.11853159,
            0.10044386, -0.19909785,  0.07842302,  0.10094837, -0.02818856,
            0.28365203, -0.00366906,  0.21658786,  0.07471534, -0.0130536 ,
           -0.20419061, -0.08081454, -0.1592173 ,  0.02630263,  0.15241717])), (1.0136313723660826, array([ 0.0047057 ,  0.00537221, -0.00935277,  0.02499465,  0.01681789,
           -0.00869954, -0.01181598,  0.00597274,  0.00296281,  0.05173311,
           -0.02052558,  0.02072112,  0.01177928, -0.05292222, -0.06871935,
            0.04976477,  0.04398419,  0.05734055,  0.28848477, -0.02145317,
           -0.02022251,  0.03238485,  0.0190972 ,  0.06581662,  0.02599537,
           -0.04334498, -0.02015788, -0.01891913, -0.00523579, -0.00176664,
            0.00598669,  0.01938069,  0.05690915,  0.25457602,  0.00047613,
           -0.43171399, -0.04496656, -0.15158992, -0.0630216 ,  0.01819853,
            0.08063392, -0.05862379,  0.01054687,  0.15530031, -0.0435065 ,
           -0.11169971,  0.00510733, -0.11902368, -0.03422134, -0.00074141,
            0.02469103, -0.08633909, -0.00487503,  0.00313326,  0.03130132,
           -0.00756799, -0.09661252,  0.04985106,  0.01110038, -0.0254607 ,
            0.0017376 ,  0.01130168, -0.05056147,  0.16722782,  0.00277896,
            0.11068128, -0.00414049, -0.33462666,  0.0250716 ,  0.44450508,
            0.13689589, -0.1373213 , -0.01468817, -0.07279611,  0.0601332 ,
           -0.1086719 , -0.10858592, -0.05833797,  0.07717707, -0.12419186,
           -0.09329828,  0.13223833,  0.03631919, -0.06444136,  0.00906957,
           -0.00439056, -0.04548441,  0.17891649,  0.03655766, -0.03796676])), (1.0118982712943438, array([ 0.00694201, -0.0018722 , -0.00281566,  0.01770142, -0.0496463 ,
            0.02999209, -0.00662784, -0.00043536, -0.00665007, -0.04269924,
            0.01364174,  0.0757743 , -0.08204845, -0.03454817,  0.116922  ,
           -0.07129044, -0.08191329, -0.15735237,  0.02548282,  0.28429731,
           -0.17313292,  0.00367994,  0.01073114, -0.12907741,  0.05887494,
           -0.09817751,  0.04700288, -0.04577495,  0.10941022, -0.01155122,
           -0.01163151, -0.03089187, -0.18285806, -0.0729333 ,  0.15773669,
            0.25080006, -0.00395445,  0.17083143, -0.00613639, -0.07731857,
           -0.16149356,  0.00252433,  0.04088276, -0.09682253, -0.06168423,
            0.02003816, -0.00543857,  0.15537622, -0.00710586, -0.02816829,
           -0.00369263,  0.15853635, -0.02918925,  0.0329961 , -0.05282505,
            0.0276843 ,  0.03692497, -0.0533023 , -0.00063083,  0.01558268,
            0.02200682, -0.03344913,  0.06746109, -0.0555776 ,  0.01370926,
           -0.11782188, -0.07465837, -0.33355448, -0.07237854,  0.27166404,
           -0.16659519, -0.0669088 , -0.07566776,  0.051166  ,  0.26762813,
           -0.06626345,  0.10173895, -0.15699645, -0.11477982,  0.02515217,
           -0.21230598,  0.16731697,  0.12222344,  0.04058821,  0.02654391,
           -0.14220437, -0.00807149,  0.07995315,  0.02476597,  0.07745563])), (1.0092805032957575, array([-1.67124034e-04, -8.52995967e-03,  2.43243338e-03, -4.55422356e-03,
           -4.47697903e-02,  4.11149142e-02,  4.35385290e-03, -8.08236351e-03,
           -2.23270588e-02,  4.54276255e-02,  2.24447699e-02,  3.72905391e-02,
           -4.58866899e-02, -4.43140590e-02, -3.17861262e-03,  1.63223223e-02,
           -1.44613585e-02, -1.95174729e-03,  4.26371234e-02,  5.62827516e-02,
            9.19523422e-02,  3.32038518e-03, -2.69015123e-02,  6.20466908e-02,
            1.08340927e-01, -8.06021310e-03, -8.97465505e-03,  1.76064776e-02,
            3.46254356e-02, -1.46048653e-01,  5.39475564e-02, -3.71279001e-02,
           -1.13952238e-02,  2.91946391e-02, -4.12869437e-02, -2.42218191e-01,
           -2.60603425e-02,  2.30730290e-01, -1.20611776e-02, -1.97548654e-02,
            8.48527712e-02,  4.62356706e-02,  3.98832642e-02,  3.56493600e-02,
           -1.63797314e-01,  1.28019247e-01,  5.54600893e-07, -2.04307386e-02,
            5.26514477e-02,  3.96836522e-02, -3.91024288e-02,  4.98853407e-02,
           -4.92069634e-03,  1.52943052e-02, -3.13390345e-02, -2.35876291e-02,
           -4.92749417e-02,  2.72170140e-03, -8.76600831e-03,  1.43572974e-02,
           -6.00816508e-03,  1.31966835e-02,  2.88326136e-02,  5.78179715e-02,
            8.31240235e-03, -1.10802895e-01, -4.18330250e-01,  5.32012386e-01,
            8.97901621e-02,  1.36166215e-01, -2.68026171e-02, -2.12739925e-01,
           -1.58496722e-01,  7.91803374e-02, -5.45815088e-02, -5.92837221e-02,
            6.01606163e-02, -1.03465752e-01, -9.80008579e-03, -4.96013262e-03,
           -9.50121718e-02,  1.54394759e-01,  1.14727370e-01,  4.03985262e-02,
            1.69140520e-02, -1.89851376e-01,  1.36970462e-01, -1.95413367e-01,
            2.32617269e-02, -6.62692406e-02])), (1.0062251615105764, array([-0.00527282, -0.01688188, -0.00468497, -0.00153288,  0.01518627,
           -0.02070147,  0.01180818, -0.01694398,  0.00687768, -0.02536273,
            0.02915883, -0.0167478 ,  0.01175675,  0.00314778, -0.07007524,
            0.01965608,  0.05609278,  0.0854933 , -0.30186512,  0.197778  ,
           -0.0217644 ,  0.05291083, -0.04169708, -0.32132285,  0.11409975,
            0.04440943, -0.14455571,  0.03647878, -0.01699231,  0.07447113,
           -0.00450323, -0.00360366, -0.01549301, -0.02353203, -0.0314328 ,
            0.0648022 ,  0.01905337,  0.16487256,  0.01712573,  0.02184085,
            0.014108  , -0.04253202,  0.02446466, -0.06003886,  0.00268156,
            0.05985967, -0.01170449, -0.06945482, -0.00672605,  0.007663  ,
           -0.02848922,  0.05508542, -0.02811783, -0.05140486,  0.1162407 ,
           -0.01263096,  0.0140447 , -0.0037905 ,  0.010043  , -0.03004912,
            0.02598565,  0.01165745, -0.00817854, -0.09190378,  0.024504  ,
            0.06228976,  0.29323598,  0.11533381, -0.11551801,  0.29810874,
            0.20107726, -0.17000041,  0.20522724, -0.05589527, -0.43514698,
           -0.00324024, -0.10267147, -0.01583151,  0.04671075, -0.05257544,
           -0.15347668,  0.09413035, -0.12768923, -0.01715487, -0.0009201 ,
           -0.06927188,  0.16380177,  0.10717406,  0.00399982,  0.02845797])), (1.0055267813795965, array([ 0.00554142,  0.00457078, -0.01396835,  0.03459154, -0.02500141,
           -0.01279514,  0.03819387,  0.00537668,  0.01086488, -0.03214228,
           -0.01963652,  0.04784108, -0.02883686,  0.03075782, -0.09864469,
            0.02593862,  0.04063425,  0.12849444,  0.00061827, -0.37222894,
            0.23468193,  0.0169766 , -0.01899214, -0.15736419,  0.03104597,
           -0.08805389,  0.06859634,  0.02762659,  0.06709668, -0.02284565,
           -0.05722747, -0.08353022, -0.03362415,  0.0023764 ,  0.06809688,
            0.01539331, -0.00184929, -0.00796171, -0.01225957,  0.00114309,
           -0.04512352,  0.06240917,  0.00397843,  0.02635132, -0.07712403,
            0.40446577, -0.00588481, -0.0807644 ,  0.13050743,  0.02701916,
           -0.04189661, -0.09141902,  0.01572212,  0.06521795, -0.10411642,
           -0.02482952,  0.0830768 , -0.0319343 , -0.01365172,  0.00232728,
            0.0491055 , -0.02366732, -0.04097945, -0.05273117,  0.02251404,
           -0.08276872, -0.03561784, -0.23494425, -0.04202028,  0.13276423,
           -0.22751433,  0.326788  , -0.11771476,  0.02225007, -0.22325178,
           -0.01935321,  0.04904005,  0.13896845,  0.08632332, -0.0863012 ,
            0.07668643,  0.03075034,  0.09717638,  0.07397294,  0.04806132,
           -0.22366605,  0.19813719,  0.08417379,  0.04081818, -0.01479638])), (1.0041959740870794, array([ 0.00822194, -0.02209017,  0.00179947,  0.01200197, -0.02659481,
           -0.00764602, -0.01011892, -0.01927093,  0.00778091, -0.05763482,
           -0.00875465,  0.04845326, -0.01277327, -0.02985779, -0.04904464,
            0.04505036,  0.03809419,  0.02666303, -0.06238701,  0.1059027 ,
           -0.24762375,  0.04243724,  0.01488936, -0.1477348 , -0.05574035,
           -0.02966242, -0.02771328,  0.14853575, -0.01779685,  0.04024805,
           -0.04920764, -0.03116516, -0.03367022,  0.14545037,  0.04058069,
           -0.29717955, -0.00541922,  0.34204901, -0.03090687, -0.01240487,
            0.05571747,  0.01055122,  0.03993054,  0.01027291, -0.02747238,
           -0.24191898, -0.01100649, -0.16332822,  0.04035829,  0.12041442,
           -0.03116228,  0.07143561, -0.01542356,  0.00947767, -0.01068069,
            0.04649149,  0.06563955, -0.02545183,  0.00819118, -0.02578114,
            0.02388872, -0.00179482, -0.00264004,  0.01443597,  0.02018351,
           -0.00036709,  0.17338862,  0.02049706, -0.2299085 , -0.05434517,
           -0.12314334,  0.36342366, -0.02171211, -0.07421302,  0.22051444,
            0.02311083, -0.09116037,  0.04618421,  0.03368351, -0.12647943,
            0.2145283 ,  0.09262533,  0.02206381,  0.01459231,  0.00105971,
           -0.01764596,  0.11883972, -0.33359707,  0.0272489 ,  0.03302657])), (1.0019590929564808, array([ 1.23686212e-02, -5.74858375e-03,  1.28667027e-02,  3.18889545e-04,
           -1.11497815e-02, -7.92571051e-03, -1.29181317e-02, -3.05287058e-03,
           -6.21565856e-03,  3.37501736e-02, -5.36172042e-02,  4.12248652e-02,
           -2.63260342e-02,  2.97645063e-03,  6.91072991e-02,  8.48349472e-03,
           -3.65125316e-02, -9.07608200e-02, -5.00614804e-02,  3.02119105e-01,
            5.47601346e-01,  1.22451603e-03, -2.54262020e-02, -2.63624794e-01,
            1.47799419e-01, -8.11331299e-03, -8.87887105e-02, -3.65799921e-02,
           -2.21336504e-02,  2.27787137e-02,  5.50243628e-02,  1.89375610e-02,
            1.02874548e-02,  8.69723865e-02, -5.44377833e-02, -9.23134139e-02,
           -6.87575836e-03,  8.50482063e-02, -2.91144402e-02, -5.19158070e-04,
            2.72972991e-02,  3.06226585e-02,  1.67418376e-02, -2.18970537e-02,
           -4.24693396e-02, -2.24557075e-01,  1.05633996e-02, -1.17081779e-02,
           -8.54985571e-02, -5.48092323e-02,  2.88630646e-02,  5.09284942e-02,
           -8.70277980e-03, -4.38590457e-02,  6.95909530e-02, -1.31847092e-02,
           -1.28531482e-02,  5.13641392e-02, -4.43967172e-02,  1.39497982e-02,
           -2.62894441e-03,  4.11256283e-03,  1.50977741e-02,  1.02750976e-01,
            5.59174440e-03,  2.15061961e-02, -3.03984631e-01, -1.64982643e-01,
           -2.21324250e-02, -3.01645261e-01,  3.32622827e-02,  1.78737544e-01,
            4.47817036e-02, -1.86339019e-02, -1.31331866e-02,  1.70534610e-02,
            7.72245188e-03,  1.52724696e-01, -5.91228172e-02, -2.68464737e-02,
            6.80235221e-03,  1.28991505e-01, -8.18517388e-02, -4.24656582e-02,
           -1.03033094e-02, -2.74849429e-02,  5.25201610e-03,  2.37418675e-01,
            9.48423952e-03, -5.23983741e-02])), (1.0005162465340969, array([ 3.04267717e-03, -1.29479763e-02, -4.44458112e-03,  1.33631972e-02,
           -9.33292211e-03,  1.54291205e-02, -4.73800601e-02, -1.16362669e-02,
            3.04550598e-03, -3.17009338e-02, -3.74833233e-03,  3.23998573e-02,
            5.86232357e-03, -7.50056174e-02, -1.70478255e-03, -9.50892730e-04,
            8.94366331e-03, -2.46116099e-02,  4.01358936e-01, -1.37224653e-01,
           -2.78080056e-02,  3.66459641e-02,  5.88338432e-02, -4.73700705e-01,
           -7.82532804e-02, -3.20303058e-02,  6.81517973e-03, -4.04013209e-02,
            3.08022972e-02,  1.42444360e-01, -4.70847004e-02,  3.52799485e-02,
           -2.20051786e-01,  3.64519976e-01, -1.65562998e-01,  1.86948483e-01,
            2.49965282e-02,  4.49718315e-04, -1.04893408e-01, -2.70135651e-02,
            3.39406504e-02, -4.57270480e-02, -3.20001865e-02, -9.09485457e-02,
            3.93327053e-02,  1.91371073e-01, -3.04620513e-03, -1.61414711e-02,
           -1.02539124e-01,  1.02693088e-01,  4.55301940e-02, -5.51003928e-02,
            3.75783992e-02, -1.67074257e-02, -2.24419941e-02,  4.82987324e-02,
            5.09746900e-02, -2.54777753e-02,  6.10867595e-03, -7.68573197e-03,
           -5.37906032e-02,  1.04949406e-01, -9.46918350e-03,  2.55699582e-02,
            3.51676513e-03,  1.13913878e-01, -5.14912042e-02,  1.92816611e-01,
           -2.01767844e-02, -1.29106051e-01,  5.41321727e-02, -8.14963631e-04,
            9.72048527e-02, -2.91598197e-02,  6.02243306e-02,  4.79375887e-02,
            3.04488822e-02, -2.10908861e-01,  1.83075881e-02, -1.12247174e-01,
           -1.22270980e-01, -6.68043476e-02,  4.66913562e-02, -4.16616129e-02,
           -3.00273118e-02,  9.55748449e-02, -7.54848923e-02, -2.59008996e-02,
            2.91027290e-03,  4.74474999e-02])), (0.9936805751250474, array([-0.00764089, -0.00593278,  0.02204877, -0.05256475,  0.00842695,
           -0.01501379,  0.02522124, -0.00706479,  0.01803719, -0.02255416,
           -0.02159876, -0.04619848,  0.0606624 ,  0.06655174, -0.13064015,
            0.04125599,  0.07308245,  0.14042212, -0.19026609, -0.08782066,
            0.22890378, -0.02309647,  0.02434943, -0.18001576, -0.27168967,
            0.16065591,  0.03658186,  0.08846365, -0.06826107, -0.06298364,
            0.04706691,  0.03390692, -0.01736851, -0.09409377,  0.05434106,
            0.12652597, -0.0080583 ,  0.15550196,  0.03400027,  0.02964071,
           -0.0185398 ,  0.05566647, -0.07869831,  0.049827  ,  0.15379865,
           -0.15366605, -0.00384803, -0.12618657,  0.0222475 ,  0.05088107,
           -0.0265037 , -0.06968476,  0.04039596, -0.02097603, -0.0149297 ,
           -0.06291175,  0.02400028, -0.0155054 ,  0.03367925, -0.0381606 ,
            0.03294177, -0.00982517, -0.04136651, -0.08259673,  0.01412005,
            0.11410254, -0.23336836, -0.22037642,  0.33824349,  0.16398198,
           -0.03704472, -0.17364007,  0.08558085, -0.01478625,  0.20608258,
            0.05323019,  0.00586049, -0.08892334,  0.0996065 , -0.07159233,
           -0.11680752, -0.08930236,  0.01112726, -0.05935329, -0.01780898,
            0.14924434,  0.15771004, -0.2990459 , -0.06971069, -0.00463926])), (0.9932619924220245, array([-4.80983034e-03, -8.33491749e-03,  1.43478570e-02, -3.39065901e-02,
            8.83024216e-03,  7.12824218e-03,  4.75620170e-03, -8.78881996e-03,
            1.12222071e-02, -5.61743165e-04,  6.82925646e-02, -5.90986511e-02,
            1.59812288e-02,  2.54101713e-03, -1.04896277e-01,  3.51737986e-02,
            6.67716613e-02,  1.29244810e-01,  2.98451160e-01,  7.20075288e-02,
            1.32998042e-01,  8.42103921e-03, -5.60365919e-02,  2.59706507e-01,
            1.64749395e-03,  1.48056990e-01, -8.53023881e-02,  4.35266088e-02,
           -3.46030524e-02, -8.41578541e-02,  7.04725333e-02,  2.81031391e-02,
           -6.20476632e-02, -1.09682870e-01, -3.49301258e-03,  2.10854352e-01,
            5.52984391e-03, -2.87410233e-02,  3.55661568e-02,  4.25020090e-02,
            9.28653091e-03,  9.60221857e-03, -1.80697323e-03, -4.10870665e-02,
            2.65839708e-02,  1.96398351e-02, -1.76642116e-03, -5.26396067e-02,
           -4.08825297e-05, -1.47565964e-02, -3.35712506e-02,  1.11012873e-01,
           -2.84716492e-02, -8.03036754e-02,  1.45558330e-01, -9.02451607e-03,
            1.73580715e-02, -5.84138883e-02,  7.40185841e-02, -4.79314092e-02,
           -1.53805564e-02,  2.60951831e-02,  2.65729607e-02, -5.54298295e-02,
            2.61356655e-03, -7.91097579e-02, -1.44960160e-01,  4.12691982e-02,
           -2.39118825e-01,  2.14212567e-01,  1.71250062e-01,  4.78605901e-01,
            1.71255685e-01,  6.99230903e-02, -9.17598680e-03, -9.36932301e-02,
           -5.15256326e-02, -2.10854344e-01,  4.20936287e-02,  1.67529425e-01,
           -8.86265705e-02,  1.31353716e-01, -9.90303711e-02,  2.10599781e-03,
           -4.12338886e-02,  1.80605862e-01,  4.81498794e-02, -9.01894399e-02,
           -3.02906660e-02, -9.57214123e-02])), (0.989390243013148, array([ 0.0027574 ,  0.0061329 , -0.02627846,  0.05095417, -0.00642473,
            0.00568102,  0.02281071,  0.00631694,  0.00755118, -0.06691076,
            0.06289873,  0.02585864, -0.03773079, -0.02254831, -0.16308598,
            0.04149849,  0.08420201,  0.19496387,  0.39379716,  0.17782559,
            0.11539291,  0.0197762 , -0.02419242, -0.05530074,  0.06499963,
           -0.09218429,  0.01384134,  0.05434109,  0.07868315,  0.01245479,
           -0.04835957, -0.06038021, -0.16823152, -0.02203941,  0.22528898,
            0.04443168,  0.00703952,  0.03605971, -0.01016621, -0.03127999,
           -0.11144527, -0.02134988,  0.04999845, -0.07390443,  0.00045527,
           -0.29444954, -0.00354035, -0.11068436,  0.14758104, -0.02428335,
           -0.01217597, -0.16673344, -0.03072165,  0.04867053,  0.02903683,
            0.02513363, -0.04544139,  0.04575122, -0.04047531,  0.01634853,
            0.04759149, -0.06510098,  0.02291705, -0.20123276,  0.00801083,
           -0.26857335,  0.18454691,  0.05340024,  0.24677043, -0.10037036,
           -0.09368322, -0.18139713, -0.02268909, -0.03296994, -0.1736654 ,
           -0.070339  ,  0.02794478, -0.01026421,  0.10866061,  0.16523264,
            0.21315456,  0.01944449,  0.07190493,  0.07717728,  0.03061382,
            0.07312778, -0.00948213,  0.03159338,  0.06226249, -0.01358497])), (0.9869860806882016, array([-2.43715809e-02, -1.07128953e-02, -2.79427241e-02,  3.89261512e-03,
            3.60937407e-02,  8.57047187e-03, -1.08440482e-02, -1.47852771e-02,
            2.35617945e-02, -5.52860667e-02,  8.90308989e-02, -3.95984242e-02,
           -9.60278839e-03, -1.10836591e-02,  6.15373627e-02, -8.70643248e-02,
           -2.15993332e-02, -3.64849330e-02, -6.41388458e-02, -5.37994962e-02,
            1.01884145e-01,  1.92638366e-02, -2.82527603e-02, -1.53857810e-01,
           -1.07907271e-01,  6.36331411e-02,  9.89305735e-02, -1.33425549e-01,
            2.25705919e-02,  1.88958506e-01, -1.01889068e-01,  7.94116198e-02,
           -1.12291832e-01, -9.18573887e-02,  2.79692392e-01, -3.16952567e-01,
            2.27300392e-03,  1.33228325e-01,  3.25689301e-02, -3.09505476e-02,
           -5.68545611e-02, -2.02397441e-01, -6.35443881e-03,  6.98686485e-02,
            8.01945161e-02,  9.92512023e-03,  1.27386606e-02,  2.51450191e-01,
           -3.89108234e-02, -6.28715652e-02,  2.70404571e-02, -2.01667306e-02,
           -3.59301358e-02,  1.05886343e-01, -1.00517967e-01,  5.56644863e-02,
           -4.85317142e-02,  1.73674110e-03,  2.56392918e-02, -2.74969992e-02,
            3.33287649e-02, -3.81662601e-02,  1.12145451e-02, -2.12696445e-01,
            6.24958136e-03,  5.91309092e-04, -1.93620042e-01,  1.22308284e-01,
           -8.50767120e-02,  2.07259697e-01,  2.06397007e-01,  1.49853779e-01,
           -3.93533862e-02, -6.78712515e-03, -1.85078872e-02,  2.14694222e-03,
           -1.44830363e-01,  8.07264254e-03, -3.51936040e-02,  1.82751221e-01,
            9.46179819e-02, -3.26090474e-01,  6.50092847e-02,  3.40593128e-02,
            4.38808117e-05,  1.04720971e-01, -1.48980575e-01,  8.79513418e-02,
            4.67517524e-03,  1.49568000e-01])), (0.9828875619572957, array([ 0.01595755, -0.01989033,  0.04337725, -0.04633459,  0.02648134,
           -0.04697169, -0.0191114 , -0.015711  ,  0.01704849, -0.01540329,
           -0.06637341,  0.02710851,  0.04839863, -0.10418241,  0.16225215,
           -0.01776017, -0.09851717, -0.18995589,  0.29041107, -0.06236597,
            0.22727756,  0.00658669, -0.01409841, -0.14199334, -0.0466045 ,
            0.13775554, -0.13467361,  0.13090866, -0.19777716,  0.10239725,
           -0.00367906, -0.02341806,  0.30707283, -0.21449997, -0.08920904,
            0.02326865,  0.06790668,  0.11017155,  0.06531824,  0.06059489,
            0.01953891,  0.04596282,  0.09166934, -0.16995796, -0.01987271,
            0.16604913,  0.03892869,  0.02906565, -0.04459147, -0.01199517,
            0.03124182, -0.06696273, -0.02876974,  0.02837694,  0.02627329,
            0.02119666, -0.11566776,  0.11415444, -0.01093605, -0.05076346,
            0.02776756, -0.04904545,  0.0584223 ,  0.11361133,  0.01445449,
           -0.17196192,  0.24766112, -0.00709194,  0.07229663,  0.18348122,
            0.05605101, -0.05606634,  0.01176013, -0.08243724,  0.17129387,
            0.08871317, -0.12892466,  0.10690118, -0.15630681,  0.12557654,
            0.12195274,  0.0135422 ,  0.0194999 , -0.05461132,  0.01772277,
           -0.16588352, -0.04162553, -0.18084166, -0.03946636,  0.00890658])), (0.9812541978117575, array([-1.58853556e-03, -2.77244697e-04,  1.30495900e-02, -2.57070737e-02,
           -2.38716681e-02,  1.76918901e-03, -1.73442425e-02, -5.66356465e-04,
           -2.92004497e-02,  1.10917867e-01, -6.27488840e-02, -1.78243115e-02,
            2.15038257e-02,  6.41694035e-02, -1.85885073e-02, -3.34623753e-02,
            4.91507234e-03,  1.85513793e-03,  1.99405167e-01, -4.64871501e-01,
           -1.81224305e-01,  3.78104386e-02, -1.92021780e-03, -1.00559003e-01,
            5.91510263e-02,  1.22049864e-01, -1.62150573e-01, -3.66253270e-02,
            3.51101906e-03, -5.65836709e-02,  1.34142652e-01, -1.98874144e-02,
            9.32473244e-03, -1.65789870e-01,  3.90059070e-02,  8.92804163e-02,
           -7.03717091e-02,  1.86738236e-01,  2.48520537e-02, -8.63464230e-04,
           -3.74401876e-02, -7.57339151e-03, -2.88428766e-02,  1.22443436e-01,
            2.08410599e-02, -3.60085882e-01,  1.84958173e-02,  4.52949299e-02,
            8.45531569e-02, -6.45286314e-02, -9.57236707e-03,  9.57283766e-02,
            4.89745997e-02, -3.24114570e-02, -6.36907782e-02, -1.65291836e-02,
           -8.62304468e-03, -1.83196203e-03, -3.28560945e-02,  4.24690729e-02,
            2.89588908e-02, -4.45793674e-02, -3.55242576e-02, -2.97437416e-03,
           -3.18511701e-03, -7.89019068e-02, -1.62655658e-01, -9.17582531e-02,
           -2.82301267e-01, -1.33934778e-01,  2.30107909e-01, -1.45985964e-01,
            8.49375746e-03, -1.38999842e-01, -1.13966250e-01,  1.39946369e-01,
            1.02468432e-01,  6.37333277e-02,  3.10513608e-02,  1.30701912e-02,
           -5.85314632e-02,  9.30604487e-02,  2.89469803e-02,  2.54664458e-02,
            1.47083627e-02, -1.94198651e-01,  2.29775840e-02,  4.11200067e-02,
           -2.41273580e-02,  9.62378425e-02])), (0.9698086941074693, array([-0.01071378,  0.00486143, -0.00196125, -0.01630911, -0.02259151,
            0.01407496,  0.00847955,  0.00253303, -0.0309036 ,  0.0140293 ,
            0.03300172,  0.01524013, -0.05273072, -0.04567607,  0.23391132,
           -0.07848098, -0.12769542, -0.21407682,  0.07116596, -0.03609987,
           -0.10502449,  0.04005191,  0.05065142,  0.17250256, -0.05749271,
            0.03372211, -0.13167271,  0.08150494, -0.04870796,  0.09539604,
            0.05417698,  0.05531372, -0.17647055,  0.11591562, -0.06368605,
           -0.09531928,  0.03512564,  0.09801424, -0.02272997, -0.03956482,
            0.0484233 ,  0.06916443,  0.01819234, -0.04563155, -0.08722159,
           -0.07701787, -0.02717661,  0.26806801,  0.00531044,  0.04114546,
           -0.09148475, -0.42609818,  0.08854235, -0.09061132,  0.12185133,
           -0.04883606,  0.13125812, -0.09962217,  0.04209341, -0.00978047,
            0.00611318,  0.01726034, -0.02390289, -0.15085736,  0.00502519,
            0.05113877, -0.09804292, -0.19753755,  0.08767721, -0.01680878,
           -0.09742722,  0.0012008 ,  0.14237304,  0.16035753, -0.31354836,
            0.09154538, -0.13866588, -0.00058064, -0.14211276,  0.03044497,
            0.00991763,  0.05382957, -0.02379609,  0.01326394,  0.00805447,
            0.04422099,  0.19718486, -0.13166665, -0.00919805, -0.05891948])), (0.9529273395960557, array([ 1.01680239e-02, -1.88980519e-02,  2.99406811e-02, -3.35410551e-02,
            2.60871214e-02,  1.87005328e-04, -4.50241781e-03, -1.58850270e-02,
            1.78052925e-02,  3.46440672e-02,  3.00147401e-04, -8.67904552e-03,
            2.08497368e-02, -4.51580137e-02,  4.06369697e-02, -2.56313711e-02,
           -2.62486361e-02,  6.18857441e-02, -1.22065618e-01,  2.46349337e-02,
            1.09344651e-01,  1.82070441e-01,  7.71152985e-03,  1.22815065e-02,
           -1.33315954e-01, -6.85951998e-03,  1.24658808e-01, -6.65173562e-02,
            1.06037135e-02, -6.65605576e-02,  8.48243247e-02, -1.20348568e-01,
            6.09639452e-02,  8.70517651e-02, -3.45482804e-02,  2.23037440e-01,
           -3.14108802e-02,  1.42127354e-02, -1.78074199e-02,  8.72644635e-03,
           -6.32716232e-02, -3.20996837e-02, -1.54585621e-02,  6.23968567e-02,
            6.59357152e-02, -1.16186844e-01, -1.81810459e-02,  1.96731998e-01,
            4.51921052e-02, -1.51162617e-01, -2.85625471e-02, -3.77574702e-01,
           -5.11642449e-02,  1.12291336e-01,  2.85677301e-02,  1.15832788e-01,
           -2.29778840e-01,  7.90781789e-02,  1.51605150e-02, -2.54163196e-02,
           -5.69090347e-02,  1.06857705e-01, -4.99492608e-02,  1.83663474e-01,
           -1.06629346e-03, -6.12071463e-02, -7.51621854e-03,  1.39727334e-01,
           -3.61690448e-01,  1.05352784e-01, -1.24486885e-01, -8.92559677e-02,
            6.51931348e-02, -1.23212333e-01,  1.09853846e-01, -1.25294871e-01,
            1.05331636e-02, -1.14362626e-01, -2.24907114e-03, -1.45530433e-01,
            2.37937775e-01, -2.62807130e-02,  8.51306367e-02,  2.80731422e-02,
            1.80940813e-02,  5.39997622e-02,  2.34065029e-01,  1.21637193e-01,
           -4.51543427e-02, -1.09308849e-01])), (0.949801385112714, array([ 0.026247  , -0.00047457,  0.01750291,  0.01778684,  0.02567284,
           -0.04074609,  0.01524609,  0.00458663,  0.0247263 , -0.08198198,
            0.03632847, -0.04948209, -0.0359562 ,  0.14245157,  0.05160305,
           -0.03531067, -0.02601146,  0.05392896, -0.08556124, -0.04018908,
           -0.02069455, -0.01690763, -0.01912764, -0.04071749, -0.08931593,
            0.0863003 ,  0.00435261,  0.01343055,  0.04213723, -0.26182683,
            0.12130055,  0.03108943,  0.02772986,  0.27414521, -0.17068434,
           -0.04992821,  0.05622691, -0.08057179, -0.06151518, -0.10373174,
            0.06792272,  0.13515434,  0.07802657, -0.37489407,  0.02056473,
           -0.09680857,  0.00834453,  0.1153062 ,  0.14089884, -0.18118251,
            0.01663615,  0.06270688, -0.06002869,  0.12750963, -0.12121124,
           -0.05682705,  0.13333067, -0.04704703, -0.01074914,  0.01353065,
            0.04337889, -0.06263388,  0.00747122, -0.14375484,  0.00436615,
           -0.2299231 ,  0.04541104, -0.02688675,  0.13416425,  0.05508127,
            0.37018287,  0.0655413 , -0.00761955,  0.08851763,  0.1076094 ,
           -0.03038559, -0.00648776, -0.07013712, -0.03279395, -0.17809777,
            0.10159005, -0.10103883, -0.04794133,  0.20119432,  0.03467796,
           -0.1360069 ,  0.00991476,  0.06485029,  0.01914473, -0.11337748])), (0.9442733887767248, array([-0.00331595,  0.01580543, -0.01863078,  0.02641719, -0.01756419,
            0.00863267, -0.05255544,  0.01428063, -0.03098377,  0.04713682,
           -0.01223649, -0.0234114 ,  0.01673431,  0.04137828, -0.04917881,
            0.02671128,  0.03935362, -0.02912825, -0.15252448, -0.16400054,
            0.0269342 ,  0.01335285,  0.0356738 , -0.06873641,  0.03472405,
           -0.08950344, -0.01980065,  0.07594835,  0.0620901 , -0.34044703,
            0.22865382,  0.07859782, -0.07444977, -0.1105835 ,  0.18634094,
           -0.09233799,  0.00439686,  0.12022529,  0.00175177, -0.03072499,
           -0.07645899,  0.05477025,  0.03649804, -0.21479675, -0.08708492,
            0.15125139,  0.04209627,  0.00873448, -0.30818838,  0.252356  ,
            0.07504972, -0.12498248,  0.011856  , -0.03461129,  0.07415387,
            0.08630299, -0.09882293, -0.0011802 ,  0.02215717, -0.01518408,
            0.03271371, -0.03960954, -0.00258443,  0.0248585 ,  0.00954393,
            0.06160335,  0.02599234, -0.10098387, -0.06614956, -0.08452039,
            0.09361263, -0.07846221,  0.06104535,  0.08054712, -0.04221732,
           -0.12005225,  0.01158872, -0.22989874,  0.04004718,  0.02458765,
            0.2916282 ,  0.17946123,  0.03770972, -0.12429102, -0.01816835,
            0.07699743, -0.30798999, -0.01072195,  0.02391488, -0.03918125])), (0.9416943150113966, array([ 3.61742058e-03, -1.94654621e-02, -1.86145277e-02,  3.91588971e-02,
            1.38525227e-02,  4.44739728e-03,  9.45754087e-04, -1.76770213e-02,
            2.02461721e-02, -1.61417044e-01,  6.29136317e-02,  5.17455870e-02,
           -2.53843405e-02, -1.11295932e-01,  1.30763281e-01, -3.64336748e-02,
           -8.15317828e-02, -8.43985908e-02, -1.27475240e-01, -2.00239008e-01,
            1.98557819e-01, -7.27339155e-02, -1.16408967e-02,  1.64224903e-01,
           -1.59373307e-02, -1.04195288e-01, -2.05862256e-02,  1.60821247e-01,
            3.37202388e-03,  1.16542567e-01,  1.55131159e-01, -1.04539639e-01,
           -2.05624726e-01,  3.19145268e-02, -2.09991272e-01,  1.19720329e-01,
           -7.45497298e-02,  1.99388651e-01, -4.70761420e-02,  1.88768117e-02,
            9.38880282e-02, -9.86437441e-02,  6.49199923e-02,  1.32814904e-01,
           -9.76731607e-02, -1.04217562e-01,  2.45565669e-02,  3.85863854e-03,
           -2.37514855e-02,  2.17459287e-02,  1.12076101e-02,  2.49809124e-01,
           -3.18286457e-02,  2.87709220e-02, -6.44097575e-02,  1.09883045e-01,
           -2.07807208e-01,  3.93896455e-02,  5.11527104e-02, -2.75265015e-02,
           -2.81618559e-02,  4.95628121e-02, -1.27479090e-01,  1.18502408e-01,
           -2.04002732e-02, -4.26501972e-02,  1.92286785e-01, -7.34691849e-02,
            1.73694750e-01,  1.13387762e-04,  1.12026592e-01,  6.61066412e-02,
           -2.07059118e-01, -6.88492163e-02, -1.43136942e-01, -8.95962379e-02,
           -2.53968635e-02, -1.38401224e-01, -2.66401891e-02, -3.33911002e-02,
           -5.20087744e-02,  1.54766613e-02,  1.63449322e-01,  1.96457648e-01,
            4.19831632e-03,  2.46611286e-01, -8.13378520e-03,  5.43401026e-04,
            7.06501601e-02,  8.24856486e-02])), (0.9337938560734337, array([-2.78666149e-04,  2.86153266e-02, -4.40329619e-02,  7.63570923e-02,
            3.35212441e-02,  1.20383512e-02, -1.94893707e-02,  2.69527911e-02,
           -5.20199285e-03,  6.19190179e-02,  1.12686446e-02,  2.49743679e-03,
           -7.47851523e-04, -6.50827399e-02,  6.04982626e-02,  3.50763726e-02,
           -2.70478021e-02, -9.41447301e-02, -4.79984129e-02,  7.06010461e-02,
           -1.57413633e-01,  1.78383556e-02,  4.43158321e-02, -1.81734767e-01,
            3.99219004e-03, -1.22994888e-01,  4.80848553e-02, -1.08445254e-01,
            1.47732168e-01, -1.72394451e-02,  1.07269968e-01, -2.70397426e-02,
           -2.00465373e-02, -2.29905336e-01, -3.09686448e-01,  1.01529457e-01,
           -2.86786596e-02,  4.32508641e-02,  2.90138653e-02,  1.79205933e-01,
            2.04568838e-01, -5.35395804e-02, -8.76425041e-02,  2.18448161e-01,
            6.14465619e-02, -8.95877060e-02,  5.63716438e-02, -1.66880716e-01,
            3.88398200e-02,  6.64525839e-02,  1.02053052e-01, -2.19362265e-01,
           -3.47961229e-02,  7.50291476e-02,  4.47602226e-03, -9.31343903e-02,
            2.14805378e-01, -4.51143042e-02, -2.67282618e-02,  6.72241207e-03,
           -1.36953903e-02,  6.76453603e-02, -3.86191907e-02, -9.30012005e-03,
            1.40203816e-02, -1.22670598e-01, -1.20398887e-01,  1.60151086e-02,
            1.16043517e-01,  2.22964993e-01,  2.96600305e-03,  1.32948538e-01,
            3.60556747e-03, -1.11277075e-01, -5.62721267e-02, -6.35223718e-02,
            1.17300309e-01,  5.81474467e-02, -8.29515328e-02,  5.69520136e-02,
            1.58439491e-01, -1.20199487e-02, -1.07114984e-01,  1.06543666e-01,
            2.92226824e-02, -8.56989836e-03, -3.23945528e-01, -1.02605111e-02,
            6.87594084e-02, -2.13386880e-01])), (0.931041145818125, array([-0.00585469,  0.01812109,  0.00792955, -0.02462631, -0.01451196,
            0.02830446, -0.00879729,  0.01597911, -0.03214228, -0.02254815,
            0.01669926, -0.02767143, -0.00122349,  0.04274492,  0.0346502 ,
           -0.00490572, -0.03843456, -0.11066957, -0.02813814, -0.03271021,
            0.14880528,  0.06438382,  0.01505054, -0.04131551, -0.01495056,
           -0.0418485 ,  0.04085931,  0.0137256 , -0.01961479, -0.05199247,
            0.02304338,  0.34772615, -0.13803342, -0.13670866, -0.21230403,
           -0.09764183,  0.04755794, -0.22810261,  0.04591375,  0.02773834,
            0.20986703, -0.00934168, -0.00424333, -0.04100463, -0.00221949,
           -0.05967108,  0.01744279, -0.23425454,  0.25326177,  0.02664948,
           -0.0048028 ,  0.0316391 ,  0.01544314, -0.01943449, -0.00194862,
            0.17549254, -0.25550851,  0.00912207,  0.06729272, -0.02334901,
           -0.06605862,  0.08964901,  0.06782998, -0.46031615, -0.01514615,
            0.03390418,  0.04328068, -0.02500638, -0.26223213, -0.02173067,
           -0.15071761, -0.11753421,  0.00625882,  0.06036338,  0.0669714 ,
           -0.01291813, -0.05330965, -0.02390962, -0.0820911 , -0.04149348,
           -0.0594575 ,  0.04543746,  0.15142232, -0.00184985,  0.00565584,
           -0.07747515, -0.04673792, -0.03077443, -0.02645897,  0.02890448])), (0.9160124331703235, array([-0.04512716, -0.0385672 , -0.0206561 , -0.04705618, -0.04212048,
           -0.01466519, -0.01829115, -0.04505448,  0.01255338, -0.04392882,
            0.12015178, -0.03515198, -0.08976007,  0.08473334,  0.0292259 ,
           -0.0184341 , -0.03226551,  0.0674858 , -0.02706551, -0.11339415,
           -0.01596481,  0.04625842, -0.11063261,  0.06854895,  0.03595167,
            0.10528534,  0.12790198, -0.10999625, -0.00719051,  0.08049302,
            0.02848249, -0.19192094, -0.00162064, -0.01302792, -0.11972141,
           -0.12932582,  0.0461804 ,  0.16420206,  0.02892848,  0.14381952,
            0.13948706, -0.05823622,  0.10181279, -0.3449726 , -0.04151092,
           -0.0870399 ,  0.01424011, -0.26918623,  0.06644839,  0.05707586,
           -0.04335126, -0.05749606, -0.02655391, -0.01128123,  0.08300746,
            0.01291072,  0.0271534 , -0.04632249, -0.03392848,  0.03656581,
            0.05312645,  0.03488761, -0.20652097,  0.20327968,  0.03210534,
            0.03886553, -0.02695497, -0.04171325, -0.03426775, -0.00089415,
           -0.21073789, -0.04495686,  0.19303697,  0.16995306,  0.06502183,
           -0.02814482,  0.10027648, -0.1487149 , -0.01703184,  0.27576032,
           -0.05856994, -0.2091439 , -0.05783323,  0.0506891 , -0.0645663 ,
           -0.09810442, -0.01194532,  0.20861665, -0.05955877,  0.23499399])), (0.9017616031502895, array([-0.01204613,  0.03157169,  0.00173901, -0.02522276, -0.02255628,
            0.08860315,  0.00054061,  0.02748577, -0.03401337, -0.15531234,
            0.11397098, -0.00209653, -0.10203533,  0.1247588 , -0.06968457,
            0.01105041,  0.07674368, -0.11018815, -0.01025301,  0.03199334,
            0.09125823,  0.05356348,  0.03618184, -0.15826721, -0.10560225,
            0.01025809,  0.03069366,  0.02691185,  0.08197771,  0.00534303,
            0.056395  , -0.36746288,  0.0181807 , -0.02410655,  0.01469366,
           -0.08056341,  0.03822262, -0.25860032, -0.0505407 , -0.04810135,
           -0.00116654,  0.05863885, -0.08239288,  0.0771162 ,  0.03592515,
           -0.09057108,  0.00783212,  0.02333246,  0.09206801,  0.16894537,
           -0.00259428,  0.00753512,  0.06255998, -0.03827773, -0.0486697 ,
            0.16976284, -0.03187038, -0.14708599,  0.06945362,  0.03166831,
            0.02937489, -0.24219907,  0.36128429,  0.25402849, -0.01963118,
            0.04604779,  0.03255906, -0.05974847, -0.16830735, -0.01446133,
            0.04296761, -0.069433  , -0.01778528,  0.14103144, -0.13537139,
           -0.02075788, -0.02933113, -0.03012077, -0.1153027 ,  0.12267673,
           -0.12071696, -0.09378854, -0.01146882,  0.1814344 ,  0.06288938,
           -0.0125343 , -0.05389026, -0.21635202, -0.03533361, -0.13628257])), (0.890644523647056, array([ 9.98843162e-03,  1.83753484e-02, -2.15618355e-02,  5.60384562e-02,
            1.90754629e-03, -4.93206812e-02,  3.44008833e-02,  1.92579847e-02,
            5.36319810e-03,  1.61861064e-01, -1.27813948e-01, -7.75819723e-02,
            2.87565785e-02,  2.07566454e-01,  2.24892437e-02,  1.65387794e-02,
           -6.30551458e-02,  5.50758222e-02,  8.31593820e-02,  1.60433647e-01,
           -3.29652843e-02, -2.71413980e-03, -5.05083252e-03, -1.54274608e-02,
           -2.22873819e-02,  3.65170615e-02,  2.51583619e-02,  5.74703905e-02,
            1.47873828e-01, -1.60874936e-01, -3.53703355e-01,  4.66269914e-02,
           -5.77386648e-02,  5.34731395e-02, -6.99392009e-02, -5.59667683e-03,
            1.50569071e-02,  1.55346729e-01,  5.42385274e-02,  2.98574646e-01,
            6.16475449e-02,  1.38453606e-01, -1.04225708e-01,  7.80865747e-02,
            2.68362391e-02,  1.41388217e-01, -6.26077147e-05, -8.67876826e-02,
           -1.64909818e-01, -1.17812824e-01, -3.00302056e-02, -2.76871244e-02,
            1.37548277e-01, -3.01084262e-02, -1.82363579e-01, -4.33397819e-02,
           -1.35010942e-01,  7.45199010e-02,  1.43697804e-04, -3.62370319e-03,
            8.65683739e-02, -1.18813821e-01, -1.84208722e-01, -1.43973498e-03,
           -1.65915130e-02,  8.35012256e-02,  4.85575694e-02, -1.41136110e-01,
           -1.23841638e-01, -1.27825175e-01,  1.58779130e-01, -7.77355540e-02,
           -1.07927145e-01, -9.25315807e-03, -4.69461310e-02, -8.77350934e-02,
           -1.55584716e-02, -8.70564735e-02, -2.14612552e-01,  2.16155162e-01,
            1.32878204e-02, -1.04393498e-01,  7.12127204e-02,  1.61393672e-01,
            2.70368961e-02,  4.29979717e-02,  1.53933935e-01, -5.87435978e-02,
            5.96815311e-02, -1.17995448e-01])), (0.8885676555317741, array([ 0.00274796, -0.02361445,  0.05116237, -0.0842552 , -0.07213576,
            0.00093838, -0.01098897, -0.02175948, -0.00658835, -0.15644416,
            0.21212145, -0.08688107, -0.12772555,  0.19739457,  0.00065999,
           -0.05611503, -0.09542822, -0.0790884 ,  0.07532689,  0.04622811,
            0.0199169 ,  0.07703434, -0.12052577,  0.0575971 ,  0.04508036,
            0.1558777 , -0.00821302,  0.18890517, -0.03258032, -0.193194  ,
           -0.25381167,  0.10040089,  0.02806504,  0.02635801, -0.03138048,
            0.06964833,  0.02135525,  0.03181669,  0.13739875, -0.08807277,
            0.02874702, -0.04365259,  0.11838292,  0.27886171, -0.20779585,
            0.08171919, -0.00790738, -0.00744908, -0.0837399 ,  0.20445817,
           -0.02006053, -0.01835591, -0.07266937, -0.02869563,  0.17928846,
            0.05813758,  0.0441099 , -0.07759448, -0.03301373,  0.08159966,
            0.07785581, -0.16431664,  0.01670631, -0.0194537 , -0.01016957,
           -0.06294145, -0.07168093, -0.00171829,  0.01843777,  0.00627979,
           -0.04178049, -0.05313962,  0.17234462, -0.27315857,  0.04703748,
            0.05148856,  0.05858007,  0.01267136,  0.0298947 , -0.26037273,
            0.07794181, -0.12729389, -0.03197504,  0.21431478,  0.01807618,
            0.02617634, -0.08542077,  0.11938408, -0.06302607,  0.11919231])), (0.8791588942184467, array([-0.03117837, -0.01399989, -0.03340375,  0.00088924,  0.02028349,
           -0.01951033, -0.03937196, -0.01919307, -0.02840681,  0.14604114,
           -0.04894041,  0.00851742,  0.04925042, -0.10984964, -0.00350001,
            0.05979627,  0.05638347,  0.05399111, -0.06529045, -0.07689977,
           -0.1253334 ,  0.04593203, -0.18111414, -0.11470011,  0.18498936,
            0.15281119,  0.16061652,  0.03052356, -0.31618665, -0.00304363,
            0.20344699,  0.11546958, -0.02264744,  0.10675221,  0.00771112,
            0.05565993, -0.06029402, -0.07399017, -0.06291728,  0.01991941,
           -0.03703263, -0.13888537,  0.06051865, -0.07519849, -0.16860843,
            0.01692649, -0.02738467, -0.15388506, -0.15641782, -0.12933231,
           -0.03753313, -0.08508395, -0.00065314,  0.04269448, -0.0422484 ,
           -0.11940561,  0.03233509,  0.19252331, -0.12759107,  0.01688613,
            0.05361439, -0.20145027,  0.26238912, -0.0368179 , -0.00501027,
           -0.03522915, -0.08001625, -0.02695852, -0.07996612,  0.03635574,
           -0.12670346, -0.03504763, -0.12295178,  0.02712415, -0.00411092,
           -0.0300398 , -0.05640326,  0.10016602, -0.13261971,  0.04293086,
           -0.05410893,  0.01850406, -0.05688299,  0.32093398, -0.04501955,
            0.29384958,  0.01334824, -0.05077861,  0.02474069,  0.03883143])), (0.8629753367296573, array([ 0.01166394, -0.04751393,  0.02226712, -0.01738976, -0.0372354 ,
            0.00137357, -0.03142243, -0.04260486,  0.01422279, -0.1007145 ,
            0.04963574,  0.04530055, -0.02641501, -0.04232224, -0.11693779,
           -0.02743814,  0.15277544,  0.18836382, -0.03718525, -0.06302018,
            0.02320708, -0.00930566,  0.04126506,  0.01878739, -0.04468988,
           -0.06594267, -0.05484857,  0.04120591,  0.02869672, -0.01306566,
            0.05889018,  0.05443091, -0.03439045,  0.02260685,  0.08644482,
           -0.02284593, -0.18626504, -0.12175179, -0.12165186,  0.50096173,
           -0.06641931, -0.01516339,  0.10534416,  0.01814603, -0.12160945,
           -0.03934602, -0.0093016 ,  0.14292649,  0.05077062,  0.00787034,
           -0.01718144,  0.03468688, -0.03991159,  0.00653545,  0.04821644,
            0.00913398,  0.06714187, -0.05698063,  0.0077161 ,  0.0090317 ,
           -0.03761309,  0.02592563,  0.14953846, -0.03722617,  0.00620756,
            0.11938813,  0.00961706,  0.05972683,  0.13493806, -0.01354354,
            0.01276413,  0.04573275,  0.08990901, -0.19106456,  0.04890457,
            0.03858561,  0.05644118, -0.09668513, -0.548203  , -0.01880534,
            0.07071005,  0.06407696, -0.00113584, -0.07058048, -0.02843993,
           -0.13582121,  0.05628283,  0.03813165, -0.01339753,  0.13068332])), (0.8317611004894274, array([ 0.00545156, -0.07008116, -0.03053225,  0.06334239, -0.00939906,
           -0.05711216,  0.02420308, -0.06509471,  0.06767608,  0.16041921,
           -0.06848737,  0.01600285, -0.01452638,  0.05829487, -0.00285141,
           -0.0621527 , -0.03677003, -0.15963981, -0.03683498, -0.07947628,
           -0.0100988 , -0.03931464,  0.10738339, -0.05091965, -0.08705204,
           -0.1497019 , -0.1584345 , -0.05739527,  0.24238019, -0.00924365,
           -0.01982462,  0.02188455,  0.02368897, -0.07832417, -0.02152104,
           -0.02948498,  0.08748718, -0.13049489,  0.02790162, -0.20781542,
            0.08399357, -0.12619883,  0.08092237,  0.03826677, -0.01464393,
            0.00819209, -0.09482382, -0.14170098, -0.141553  , -0.26439855,
           -0.08143666, -0.04212307, -0.09987588,  0.04378873,  0.11845842,
           -0.10995148, -0.0611874 , -0.03675344,  0.10142608, -0.04646941,
           -0.07001699, -0.03058676,  0.28216562,  0.02546009, -0.02403687,
            0.12041728, -0.01333376,  0.00929337,  0.14548328, -0.04950643,
            0.01893052,  0.06961991,  0.08997331, -0.16177626,  0.05114716,
           -0.13466767,  0.02926301, -0.18830137,  0.10727718,  0.17320939,
            0.11706974,  0.06495608, -0.08982689,  0.17787126, -0.04179447,
           -0.10593689,  0.23824305, -0.06316331,  0.06327437,  0.28536944])), (0.8087658281973401, array([-1.38037599e-02, -2.62538836e-02, -1.50064891e-02,  7.73353884e-04,
            5.58215569e-03,  3.85305662e-02,  4.85734662e-02, -2.74253271e-02,
            8.12131793e-03, -3.85170607e-02, -7.26463929e-03,  4.13665719e-02,
            8.12804341e-03, -7.43367589e-02,  2.48403179e-02,  1.01314548e-01,
           -5.67692097e-01,  3.38188062e-01, -1.75549813e-02, -3.81677187e-03,
            1.21591794e-02, -1.34692259e-02,  8.01078060e-02, -2.71126200e-02,
           -5.01367953e-02, -5.56049468e-02, -6.84804047e-03, -8.00401890e-02,
           -1.29583624e-02,  7.48112337e-02,  5.96831837e-02,  3.49400558e-02,
            3.59546531e-02,  1.17052541e-01, -3.22895551e-02, -1.20091688e-02,
           -3.68001001e-01, -3.27377391e-02,  5.25519139e-01, -5.54732492e-02,
           -9.28767847e-02,  2.24409625e-02,  1.66298363e-02, -4.90522909e-02,
           -4.74724762e-02, -2.35151595e-02,  2.44117022e-02, -3.14640906e-02,
           -2.60737436e-02,  3.45476294e-02,  1.28535207e-02, -1.72745096e-02,
            1.94282952e-02, -3.27232993e-03, -2.34476222e-02,  5.73397163e-02,
            9.32157164e-03, -7.73506024e-02,  2.56921031e-02,  2.98558779e-02,
           -4.61180858e-03, -6.72791909e-02,  1.15526687e-01, -1.48512807e-02,
           -1.64962985e-02,  6.39246379e-02,  1.09245648e-02, -4.17016748e-03,
           -1.24036206e-02, -1.39811565e-02,  1.60576603e-02,  1.25968178e-02,
           -3.82114539e-03, -2.84256479e-02, -1.49265660e-02,  2.17853452e-04,
           -3.03888764e-04, -3.57882575e-02,  1.10530819e-02,  7.29136841e-02,
           -2.73872115e-02,  2.33068073e-02, -2.51894381e-02,  4.08449419e-03,
            1.14798768e-02, -6.14097124e-02, -4.28706108e-02, -4.81625998e-02,
            1.97217871e-02, -5.97897486e-02])), (0.7892722212229611, array([ 0.01759243,  0.0244195 , -0.01348355,  0.05594109, -0.02822828,
           -0.03114862, -0.01025114,  0.02642081, -0.00898686, -0.42732225,
            0.05497189,  0.09793574, -0.00615247, -0.19597648,  0.10631042,
            0.27251081,  0.14753006, -0.00906053, -0.0057597 , -0.00494552,
            0.05043376,  0.12632815,  0.11116636,  0.13092532, -0.06112112,
           -0.09693479, -0.05446578, -0.14006325,  0.00753102,  0.02019177,
            0.00181738,  0.21916386,  0.17045392, -0.02379436, -0.12168054,
            0.00553675, -0.00742759,  0.07757025, -0.19196648, -0.05025151,
           -0.231689  ,  0.14095046, -0.02819515, -0.0134607 , -0.0707351 ,
            0.00763897, -0.02964745, -0.16497223, -0.19827882, -0.18079722,
           -0.0066776 , -0.05190613,  0.05535967, -0.00839952, -0.06733092,
            0.25211957,  0.17666072, -0.13722392, -0.04466507,  0.09203293,
            0.05474654, -0.09144626, -0.07607113, -0.06254102, -0.01293667,
            0.0762188 , -0.07111797, -0.01021933, -0.10496458,  0.01394139,
            0.05186032, -0.09578208,  0.00189994, -0.11034241, -0.0020095 ,
            0.00307013,  0.04406047,  0.02847866,  0.08419341,  0.14161595,
           -0.00614269, -0.04934289, -0.02857221,  0.07786195, -0.02494885,
           -0.03030585,  0.04267903, -0.03400083,  0.06690435,  0.00336231])), (0.7619517528659735, array([ 0.02689377, -0.05957791, -0.02589526,  0.09474024, -0.02102001,
            0.03012182, -0.04118114, -0.05106918,  0.04332153, -0.29746521,
            0.04675402,  0.06112399, -0.0113335 ,  0.02307008, -0.06525496,
           -0.27287067, -0.09734835, -0.0244697 ,  0.00126079, -0.00036962,
           -0.00616904,  0.22725556, -0.12359879,  0.05529858,  0.0652219 ,
            0.06904022,  0.031982  ,  0.08157967,  0.01682568, -0.01334902,
            0.02608926, -0.14114573, -0.20349955, -0.08650184,  0.10832692,
            0.00046514,  0.01815474, -0.09422992,  0.11832343, -0.03575713,
            0.28709648,  0.24770688, -0.08384164, -0.01241468, -0.05904321,
            0.01440831, -0.05734269, -0.09360715, -0.15191822, -0.17191821,
            0.00443259,  0.02042583,  0.10592989, -0.01977457, -0.16017916,
            0.00170861,  0.1962971 ,  0.20708513, -0.15433359, -0.09634164,
           -0.04421257,  0.22916174,  0.1700131 ,  0.03785226,  0.1072855 ,
            0.05827851, -0.01864558, -0.00840599, -0.03722675,  0.05147844,
            0.07801918, -0.0726303 ,  0.02782703, -0.07946549,  0.02506575,
            0.09877713,  0.03717162,  0.07143827,  0.02336953,  0.02438579,
            0.10413921,  0.00061904,  0.07713089, -0.18325313, -0.12507461,
            0.06179492,  0.05677923,  0.00796665,  0.12786405, -0.01064299])), (0.754916612665571, array([ 3.44821659e-02,  1.26829308e-01,  5.43327399e-03,  5.40251208e-02,
           -6.31772239e-03, -7.31096931e-02,  1.00194762e-01,  1.26311627e-01,
           -2.83309870e-02, -1.38370377e-01, -2.42896274e-02,  3.44449527e-03,
            3.27200264e-02,  3.32972459e-02,  3.54149565e-02, -2.48360446e-01,
           -1.19435469e-02,  1.12217435e-01,  2.54212469e-02, -3.23663707e-02,
            3.06979595e-02,  1.09563881e-02,  6.97932703e-02,  4.88264866e-02,
            2.38292722e-02, -1.34668990e-01, -1.80846307e-01,  8.10677045e-03,
            1.07050838e-01, -2.81961937e-02, -4.41994556e-02, -4.31431671e-03,
            1.40805904e-01,  9.43767166e-02,  9.03656398e-02,  3.75429265e-02,
           -1.45878411e-01,  4.79768652e-02,  6.34142899e-03,  8.05501059e-02,
            2.01874747e-01, -8.59169435e-02, -1.31662987e-01, -5.88583553e-02,
            3.81207580e-01,  6.76882097e-02, -6.34175310e-02, -1.04675300e-01,
           -1.41017598e-01, -5.40828206e-02, -2.52485663e-02, -4.05080992e-02,
           -1.23833066e-01,  5.85499444e-02,  1.34698798e-01,  1.27605442e-01,
            5.56295706e-02, -4.32121795e-02,  2.69422352e-02, -1.30918221e-02,
            3.10371534e-02, -7.44338840e-02, -1.96521751e-02,  1.76668316e-02,
           -1.26032960e-02, -2.26440295e-01, -1.03492450e-01, -5.83729646e-05,
           -8.81297653e-02,  1.30297831e-02, -5.17166653e-02, -1.00603415e-01,
           -4.24944869e-02,  2.26723469e-01, -2.88255891e-02, -3.69593029e-02,
           -4.74195219e-02,  1.69271688e-01, -1.43065221e-01, -1.81965117e-01,
           -3.44930629e-02,  1.20781113e-01, -6.03238202e-02,  4.87405557e-02,
            6.82943198e-02,  1.60096454e-01, -6.43351316e-02, -7.49687169e-02,
           -4.76429693e-03,  3.09960604e-01])), (0.7309008552453276, array([-3.12576183e-02, -3.00353136e-02, -1.15434524e-02, -3.74192404e-02,
            1.21194607e-02,  3.54390679e-02, -5.92220714e-02, -3.43419973e-02,
           -1.62954981e-02,  3.61814212e-02, -4.15691648e-02,  1.62172077e-02,
            3.11683856e-02, -9.99366833e-02,  2.28516084e-01, -4.59537346e-01,
            1.41672965e-01,  1.77388278e-01, -2.70709877e-03,  3.68925400e-02,
            4.19564801e-03, -1.19786356e-01,  9.82757133e-02, -2.37302746e-02,
           -1.15006306e-02, -1.82167491e-02,  3.53555482e-02, -8.99522902e-02,
           -1.05099269e-01,  1.40408070e-02, -3.63399133e-02,  9.96778586e-02,
            1.44669392e-01,  1.55519585e-02,  2.29991872e-01,  1.48939095e-01,
           -2.33116872e-01,  2.86760961e-02, -1.95486408e-01, -1.15693709e-01,
            4.00805322e-01,  2.27205418e-02,  5.11989738e-02, -1.45252118e-02,
           -1.52042970e-01, -1.81434799e-02,  8.15185742e-02, -4.65860405e-02,
            4.41958474e-02,  8.88374795e-02,  1.01053499e-02, -2.93229234e-02,
            3.75101120e-02, -2.43764479e-02, -1.78384967e-02,  1.00186522e-01,
           -4.09938703e-02, -9.78378753e-02,  1.45384629e-02,  8.39263450e-02,
            5.42777656e-02, -1.67621893e-01, -3.27909821e-02, -2.52681884e-02,
           -4.20184070e-02,  1.35099267e-01,  3.70030499e-02, -2.57640682e-04,
            2.09275366e-02, -1.31613049e-02,  1.01878028e-02,  3.99945612e-02,
           -6.81091768e-03, -9.42781981e-02,  9.02301120e-03, -2.31518100e-02,
            1.72141119e-02, -5.61493349e-02,  1.20112902e-01,  1.12640843e-01,
           -1.63524514e-02, -3.38260736e-02, -3.09316007e-03,  4.91087253e-02,
           -2.52284664e-02, -9.83192805e-02,  2.23107503e-02,  1.35065309e-02,
           -2.10648854e-02, -2.23614183e-01])), (0.6902086442370304, array([-2.48394712e-02, -6.54456851e-02, -6.04496784e-02,  5.97797123e-02,
            1.04453000e-02,  4.43533564e-02, -2.34111646e-02, -6.65301057e-02,
            1.68492557e-02,  1.19272280e-01,  1.81350517e-03,  6.97093156e-03,
            2.12508055e-02, -1.60698027e-02, -1.93840577e-01, -1.31447806e-02,
           -7.15273312e-02, -1.20965888e-01,  6.88178481e-03, -6.60825012e-04,
           -1.85468037e-02, -5.97973938e-01,  9.73324139e-02, -1.66199196e-02,
            8.01983100e-02,  2.45483333e-02,  3.08441845e-02, -4.48242011e-02,
           -9.00496982e-02, -5.94900724e-02,  6.40474671e-02, -2.86411959e-02,
           -3.14888366e-02,  6.02455867e-02,  4.18503179e-02,  1.80504114e-02,
            2.18824714e-01,  8.52678194e-03,  1.26993359e-01,  8.53242763e-02,
            2.12018084e-02,  7.16294977e-02, -8.95454956e-02,  6.51731177e-03,
           -1.56303498e-02, -1.06026249e-04, -3.25688168e-02,  6.19167864e-04,
            4.47910476e-02, -3.85713194e-02,  3.36407794e-02, -7.61597641e-02,
            7.70892141e-03, -9.53573105e-03,  2.38271484e-02,  5.23502403e-01,
            9.54240031e-02, -6.46799446e-02, -7.69163962e-02,  3.55080562e-02,
            4.00248106e-02,  2.41773244e-02, -2.64112052e-02,  1.34953270e-02,
            4.29121784e-02,  3.65390693e-03, -1.79335175e-02,  3.32345142e-02,
           -2.99637256e-03,  9.53443834e-02,  4.16274493e-02, -1.70963714e-02,
            2.96411797e-02, -3.01885058e-02,  4.34496481e-02,  6.73465056e-02,
            1.56924088e-02,  1.05070539e-01, -2.55381324e-02,  2.46675269e-03,
            5.41649939e-02,  3.12106893e-02,  9.12954266e-03,  8.48837089e-02,
           -1.92830666e-01,  7.03777360e-02,  1.22657643e-01,  6.04117042e-02,
            6.30270009e-02,  3.83421834e-02])), (0.6604534042250566, array([ 0.03493521,  0.00433818,  0.00421347,  0.05698903, -0.0472373 ,
            0.08277977, -0.04077489,  0.01079527,  0.04588983,  0.22098083,
            0.20979844, -0.09531005, -0.20616871, -0.00620906,  0.38560093,
            0.15308173,  0.12878625,  0.21170295,  0.01391932,  0.00681338,
           -0.02000289, -0.21287615, -0.12193815, -0.05404867,  0.0496134 ,
            0.09394178,  0.02479103,  0.11795467,  0.14014343, -0.02948085,
           -0.12000781, -0.05256568, -0.06624699, -0.14857877, -0.17420537,
           -0.08197003, -0.38631173, -0.04199609, -0.18837499, -0.1385434 ,
           -0.10178701,  0.17064317, -0.10241735, -0.01967789,  0.11111213,
            0.02463147, -0.00974989,  0.00286693, -0.04670847,  0.01330558,
           -0.02012631, -0.01593084,  0.01209422, -0.02779087,  0.02695335,
            0.08297272, -0.04745558,  0.06196023, -0.0145512 , -0.06916519,
           -0.0275976 ,  0.11834801,  0.15227742,  0.01652194,  0.06305596,
           -0.02901072,  0.01837811, -0.01971193,  0.00842294,  0.04959294,
            0.02020987, -0.00417164,  0.06980409,  0.03091795,  0.03161785,
            0.09845365,  0.02155261,  0.07654164,  0.03378785, -0.02283066,
            0.0701262 , -0.00271604,  0.07094519, -0.02398482, -0.15269574,
            0.04210218,  0.04026346,  0.01763493,  0.02093315,  0.11314927])), (0.6482152267621847, array([ 0.04976821,  0.10658763,  0.0956541 , -0.07532322, -0.05847024,
           -0.05523187,  0.07456164,  0.11014023, -0.05021298,  0.06017025,
           -0.0058527 ,  0.02331819, -0.03819477,  0.01330611, -0.00425335,
           -0.02067504,  0.00599168,  0.00552276, -0.01696796, -0.00386224,
           -0.02471487, -0.21103374, -0.09226116,  0.019868  , -0.01843372,
            0.10301034,  0.0947492 ,  0.05973184,  0.00468052,  0.06264263,
           -0.03086377, -0.07067192, -0.1095315 , -0.1011189 , -0.03080424,
           -0.03838276, -0.05651225, -0.03669107, -0.0314335 , -0.02534076,
            0.01485672, -0.23178877,  0.20503302, -0.03629108, -0.11232838,
           -0.00681464, -0.11182121, -0.16201526, -0.22460152, -0.17242559,
            0.05171118, -0.07330419, -0.00841712,  0.04114159, -0.03167827,
            0.18503452,  0.05815386, -0.12078235, -0.07831803,  0.08114397,
            0.00129601,  0.12729861,  0.06880663, -0.01546559,  0.07662265,
           -0.15732585, -0.01931486, -0.07002065, -0.04065681, -0.0829991 ,
            0.05445492, -0.00582084, -0.05749134, -0.08301152, -0.05502221,
           -0.08062502,  0.00127886, -0.09811222, -0.0351346 , -0.09271583,
           -0.07466982, -0.14215307, -0.07893433, -0.32281736,  0.48346038,
           -0.04718825,  0.07673822, -0.0884037 , -0.05632293, -0.13510046])), (0.5714116669727295, array([ 1.59600051e-02,  4.21114418e-02,  1.28461418e-03,  2.71529821e-02,
            3.42196303e-02,  1.54971693e-01,  1.19960622e-01,  4.28047863e-02,
            2.80131156e-02,  2.61894305e-01,  3.98224856e-01,  2.22430745e-02,
           -2.12989146e-01, -2.37918870e-01, -1.51039961e-01, -7.93097234e-02,
           -3.08705345e-02, -5.13930226e-02, -6.11104656e-03, -4.42445262e-03,
            8.73600832e-03,  1.14425648e-01,  9.10932453e-02, -2.46844763e-02,
           -4.11581153e-02, -4.09992595e-02,  2.33406505e-02, -8.65906300e-02,
           -8.25769878e-02,  2.64222833e-04,  7.82106040e-02,  2.64207688e-02,
            4.98407339e-02,  4.63747920e-02,  6.02988198e-02,  3.19919850e-02,
            9.85422558e-02,  2.08581457e-02,  2.67063403e-02,  5.80787317e-02,
            4.37215388e-02,  2.30841366e-01, -6.49068804e-02, -9.32392711e-03,
           -3.29560958e-02, -6.30860877e-03, -1.62708094e-01, -9.22755011e-02,
           -1.48081897e-01, -1.26421154e-01,  5.96946527e-02,  5.95035353e-02,
           -2.80467760e-02,  8.64978493e-03,  2.13743205e-02, -2.22829559e-01,
           -2.65768161e-01, -3.39638410e-01,  1.08326126e-02,  2.26939806e-01,
            9.86318206e-02,  4.80225690e-03, -3.33788664e-02, -9.39573658e-03,
            8.31612558e-02, -6.48915304e-02,  5.78423406e-02, -1.50498485e-02,
           -4.49476406e-03,  1.04890517e-02,  2.80642735e-02,  3.19174967e-02,
            1.41741391e-02,  6.45255334e-04, -1.34223998e-02,  1.02578208e-01,
            2.84638364e-02,  9.67061410e-02, -4.95645113e-02, -6.37995458e-02,
           -6.47332744e-03, -9.21554205e-02,  1.06503605e-01, -6.52962072e-02,
           -2.19855513e-02,  4.75944087e-02, -5.60596598e-03, -3.71788394e-02,
            9.41067553e-02,  2.11100251e-02])), (0.42504944989687043, array([-3.47767932e-02,  7.58451509e-02, -4.23429705e-02,  9.86716501e-03,
            1.59940061e-01, -2.70717662e-01,  1.77383051e-01,  6.49096231e-02,
            5.12056845e-02,  7.02072691e-02,  1.30791907e-01,  2.40911060e-03,
           -6.06508944e-02, -7.60844667e-02, -6.15267202e-02, -1.58919873e-02,
           -1.34459445e-02, -1.77913733e-02, -6.43477985e-03, -4.10655156e-03,
           -9.60136167e-03, -8.54788290e-02, -1.57451652e-01, -3.58176329e-02,
           -6.13724475e-02, -1.69618627e-02, -1.66075688e-02,  4.99590425e-03,
            1.02490121e-01,  7.75399505e-02,  5.23546073e-02,  3.21318129e-02,
            6.12294011e-02,  4.85299700e-02,  4.39752350e-02,  3.26550708e-02,
            6.30899239e-02, -1.77241962e-03, -1.34126947e-03,  6.32237248e-03,
            1.14247306e-02,  2.16675054e-01,  1.34977708e-01,  5.60382056e-02,
           -2.14084863e-02, -4.14393775e-03,  5.55328275e-01,  3.27885606e-04,
           -6.98772398e-03, -5.20532257e-02, -5.58848615e-01, -7.48481022e-03,
           -3.12871441e-02,  2.89332523e-02,  1.22703678e-02,  4.22965525e-02,
            1.80475928e-02,  2.41221529e-02, -1.97257540e-02, -2.97198520e-02,
            1.06193706e-02,  4.19201509e-02,  3.43960546e-02,  1.42779862e-03,
            3.66200453e-02,  4.72563016e-02, -2.81442245e-02,  1.92407077e-03,
           -5.40808902e-02, -2.68162723e-02, -1.76657479e-02, -2.53211115e-02,
           -1.78627652e-02, -2.81419084e-02, -2.40262612e-03, -2.32095273e-02,
           -4.49570115e-03, -2.05890542e-02, -1.64679051e-04, -2.21925687e-02,
           -3.75932025e-02, -1.62296398e-02, -5.76901005e-03, -4.50529220e-02,
            6.19451479e-02,  2.40252178e-02, -1.40397374e-01, -2.13950623e-02,
            7.27210216e-02,  6.17761287e-03])), (0.3978166894517955, array([-0.11005611,  0.05075236, -0.09624746, -0.0346809 ,  0.24849045,
           -0.26368587,  0.28140826,  0.02678829,  0.06076458,  0.04763941,
            0.11089526, -0.01547567, -0.03883996, -0.05911073, -0.02447412,
           -0.03007646, -0.00050185,  0.00710358, -0.00213599, -0.00313048,
           -0.01511068, -0.07451991, -0.29919188, -0.01310891, -0.09356173,
           -0.01210994, -0.03275207,  0.04013982,  0.15592938,  0.132279  ,
            0.08999955,  0.10598052,  0.10504277,  0.0389563 ,  0.04564958,
            0.02487026,  0.01432714,  0.01865657, -0.01893375,  0.00605514,
            0.01792934,  0.28197809,  0.19678471,  0.07381823, -0.01360361,
           -0.00726783, -0.38029588,  0.04512801,  0.09609548,  0.10470117,
            0.38468441, -0.02207943, -0.01634265,  0.01960711,  0.00246179,
            0.05325301,  0.08705457,  0.17774632,  0.04495314, -0.13924086,
           -0.03897458, -0.05755354, -0.02677295,  0.00089107, -0.06797897,
            0.06020379, -0.04883762, -0.00645087, -0.07838135, -0.04969979,
           -0.04826861, -0.0359796 , -0.01252585, -0.01454904, -0.01312105,
           -0.0546026 , -0.01316386, -0.07149409,  0.0161508 ,  0.05153606,
           -0.05670376,  0.01740911, -0.01657489,  0.04728431,  0.02907354,
           -0.02996158, -0.04353655, -0.00926233,  0.01796126,  0.01685783])), (0.35238333407115874, array([ 2.72172870e-01, -5.71486439e-02,  2.58216217e-01,  5.05159254e-02,
           -2.04062845e-01,  4.30256785e-01,  1.45763747e-01, -1.72881194e-03,
            1.51499040e-01,  6.40095103e-02,  1.82131063e-01,  9.17543931e-02,
           -1.26643385e-01, -1.76507321e-01, -1.16503634e-01, -4.72962966e-02,
           -3.04675728e-02, -5.48197726e-02, -3.20158781e-02, -1.50510244e-02,
           -1.22736041e-02, -3.80745058e-02, -6.50252743e-02,  1.23130614e-02,
           -2.72930734e-04,  9.95479012e-03,  1.29080008e-02,  1.73148004e-02,
           -4.74740301e-03,  2.57731002e-03,  1.92073566e-02,  2.77597161e-02,
            3.68733793e-02,  2.46648624e-02,  1.48229781e-02,  1.21683841e-02,
            2.71664328e-02,  8.47512627e-03, -8.89655041e-03,  7.07393823e-03,
            2.64322939e-03, -6.02993028e-02,  8.05093040e-02,  9.54459142e-03,
            1.44879125e-01,  7.48095238e-03,  4.13573441e-02, -6.52609644e-02,
           -4.99949244e-02, -5.31302586e-02, -8.08456803e-02, -2.23092074e-02,
            3.28519179e-02, -1.07182823e-02, -4.06721339e-02,  6.01641342e-02,
            1.35307018e-01,  3.11889795e-01,  1.75319796e-01, -2.24787076e-01,
           -1.42653467e-01, -2.41317120e-01, -1.45056322e-01, -3.60470006e-02,
           -2.73935664e-01,  3.35921547e-02, -1.05881773e-02, -2.23898076e-02,
           -8.04181260e-03, -4.15624181e-02,  1.37560250e-02,  1.81325894e-05,
            4.01312448e-02,  1.56110706e-02, -1.81586891e-02,  6.98493209e-03,
            2.21580621e-02, -5.27720713e-02, -1.73222731e-02, -5.76305220e-03,
            3.77462940e-03, -1.69856197e-02, -1.57448527e-02, -7.42518745e-03,
            3.44003188e-02, -1.07208876e-01, -3.22301219e-02,  3.40912034e-03,
           -1.51647621e-02, -6.78213851e-02])), (0.337764060513702, array([-0.10267333, -0.29443803, -0.0915461 , -0.02929046, -0.01193299,
            0.13856508,  0.76303677, -0.2975754 ,  0.17926268, -0.05497305,
           -0.15320546, -0.03482298,  0.08583347,  0.10016258,  0.09579815,
            0.03742155,  0.05589383,  0.01959031,  0.00804244,  0.00642888,
            0.00549271,  0.02367011,  0.16449441,  0.00371798,  0.03479047,
            0.03756304,  0.02415189, -0.0281378 , -0.09282584, -0.05453758,
           -0.05188437, -0.05920363, -0.0772278 , -0.02662847, -0.03024676,
           -0.01297158, -0.02741065, -0.0048173 , -0.09995176, -0.01941201,
           -0.0087679 , -0.04618292,  0.01019038, -0.01331143, -0.05073854,
            0.00250184,  0.01289621, -0.00670389, -0.01413956, -0.00291049,
           -0.02149748, -0.02609368,  0.0018284 , -0.00596198,  0.01653413,
           -0.00368353, -0.02215825, -0.04566043, -0.03991345,  0.04523418,
            0.01758591,  0.0493502 ,  0.02356299,  0.00854587,  0.04467478,
           -0.03361031, -0.00296489,  0.00296015, -0.03464237,  0.0162329 ,
            0.0092337 , -0.01099873,  0.01092503, -0.00199202, -0.00341072,
            0.03307768,  0.0134801 ,  0.03392747, -0.02049286, -0.02969398,
           -0.00254682,  0.02844248,  0.01158971, -0.02635988, -0.00291389,
            0.02692371, -0.12320224, -0.01920371,  0.01894774,  0.01093196])), (0.24537240875649707, array([ 0.33346055,  0.05539091,  0.28223949,  0.12145879, -0.07080231,
           -0.54679222,  0.05434262,  0.11623824,  0.25216066,  0.04192859,
            0.03877752,  0.02991654, -0.03311103, -0.04946977, -0.0402435 ,
           -0.02243829, -0.01467914, -0.00918881, -0.0232997 , -0.01064298,
            0.01051628,  0.05725223,  0.33682475, -0.00087466,  0.08132144,
            0.09169357,  0.05327385, -0.08259453, -0.2000579 , -0.09646241,
           -0.10150003, -0.12684152, -0.17657528, -0.06797636, -0.04195422,
           -0.03898997, -0.07239586, -0.02050143, -0.02096775, -0.02422455,
           -0.00828482, -0.00810232,  0.25424483,  0.02165218,  0.09415622,
            0.00784175, -0.04497648,  0.03142562,  0.0495527 ,  0.04873873,
            0.07287478,  0.00271661,  0.01906564, -0.01831662, -0.00271585,
            0.00710061, -0.00067786, -0.01882985,  0.00452397,  0.01880303,
           -0.00790012, -0.01824323,  0.00445034, -0.00215321, -0.01528222,
            0.06074812, -0.00679641,  0.02165929, -0.02416152,  0.01245209,
           -0.00502446, -0.01888459,  0.01762397,  0.02308425,  0.00868448,
            0.03326563,  0.02750681,  0.04028162,  0.01820977,  0.0451108 ,
           -0.01384362,  0.03892947,  0.01754462,  0.05322128, -0.13580873,
            0.02668102, -0.02897502, -0.00206738,  0.0692191 , -0.04783141])), (0.2105885520158932, array([-3.10827511e-02,  1.13819233e-01, -2.59825677e-02, -1.18901436e-02,
            6.81701205e-01,  3.06343714e-01, -1.26543027e-01,  1.01456929e-01,
            9.27070651e-02, -1.32567215e-02, -4.05979079e-02, -1.17258673e-02,
            2.21670119e-02,  2.38461476e-02,  4.04458947e-02,  2.00715548e-02,
            1.69954253e-02,  1.89284684e-02,  5.34251756e-03,  5.52036179e-03,
            7.37334954e-03,  3.87231506e-02,  2.99418215e-01, -4.13107676e-03,
            8.31918236e-02,  8.00694576e-02,  4.75864449e-02, -5.59973043e-02,
           -1.92155780e-01, -1.10033864e-01, -9.15510655e-02, -1.07284028e-01,
           -1.41548109e-01, -4.97663771e-02, -3.94930698e-02, -2.29170359e-02,
           -3.44465298e-02, -4.57614518e-03, -1.30644373e-02, -7.07281585e-03,
           -1.16763094e-02,  1.65801688e-01,  8.82661150e-02,  3.80122928e-02,
           -3.37033427e-02, -6.39047241e-04,  2.76063573e-03, -1.30101566e-02,
            1.93777474e-03, -2.38012823e-03, -2.63197535e-02,  3.26343809e-03,
           -9.28626432e-02,  3.89161237e-02,  9.71798149e-02,  2.76026138e-02,
            7.80342774e-02,  9.49966701e-02,  2.93796276e-02, -6.03171537e-02,
           -4.85341294e-02, -4.54767224e-02, -2.95147400e-02, -4.82907511e-03,
           -7.11207984e-02, -1.90243245e-02, -2.03019810e-02, -1.67590552e-02,
           -1.26001114e-02, -3.31330276e-02, -2.24885219e-02,  4.54737692e-03,
           -2.60866996e-02, -3.48038740e-02, -1.82847898e-02, -6.46713273e-02,
           -2.47287306e-02, -5.94244019e-02, -4.63910905e-03, -2.27185408e-02,
           -2.22768472e-02, -5.12580769e-02, -6.51606223e-02, -4.44153155e-02,
            1.35305829e-01, -5.71161658e-02,  1.20859100e-02, -2.49796205e-02,
            1.07341383e-01,  1.86479045e-01])), (0.1947049465927752, array([-3.13869905e-01,  1.80486031e-01, -2.65715965e-01, -1.14222059e-01,
           -3.14658867e-01,  2.07415000e-01, -7.25748995e-02,  1.10132718e-01,
            2.73851128e-02, -2.79589005e-02, -6.61198858e-02, -2.94104068e-02,
            4.76871738e-02,  5.84940338e-02,  2.88044602e-02,  2.15043113e-02,
            7.68030166e-03, -5.12020248e-03, -5.74688081e-03, -2.59236090e-04,
           -7.45518168e-04,  6.28118296e-03,  5.26188188e-02, -1.97042248e-03,
           -3.47192829e-03,  4.47694529e-02,  4.15371618e-02, -6.65222015e-02,
           -6.05610881e-02,  6.49012625e-03, -7.36300556e-04, -2.45104237e-02,
           -8.83109239e-03,  5.97309951e-03, -5.23943333e-03,  1.80728351e-03,
            8.64929209e-03, -2.23466435e-03,  6.52532347e-03,  6.87686189e-03,
            3.35876540e-03,  8.63268838e-02,  6.05110803e-01,  1.02871165e-01,
            3.29692003e-01,  4.14976067e-02, -1.15587468e-02, -8.60571370e-03,
           -1.93268840e-02, -1.58919638e-02, -1.88075817e-02, -4.20552528e-03,
            1.36702370e-02, -8.55654201e-03, -1.25536852e-02, -1.46180593e-02,
           -2.38149310e-02, -2.39224795e-02, -2.00389712e-04,  4.77005636e-03,
            1.56076319e-02,  1.42342996e-02,  8.27201823e-03,  5.16918033e-04,
            2.24203268e-02,  1.39543962e-02,  1.51098959e-03,  9.13492250e-03,
           -6.92233410e-05,  5.79228782e-04,  2.28272273e-02, -1.40926361e-03,
            3.42544640e-02,  1.98555635e-03,  1.44637639e-03,  4.26271148e-02,
            2.96759892e-02,  2.97914223e-02, -1.21144395e-02, -5.41792673e-03,
            4.83805949e-03,  6.05804192e-02,  4.77400799e-02,  1.42403755e-02,
           -7.40234918e-02,  3.17133754e-02,  1.40296135e-02,  7.85101481e-03,
            2.74875888e-01, -5.90879502e-02])), (0.12332392942132495, array([ 4.53435101e-02, -2.78034922e-03, -3.13445963e-01,  6.30711864e-01,
            1.27697456e-02,  6.64868949e-02, -5.75087081e-03,  6.07346566e-03,
           -2.61678153e-02, -9.93067403e-03, -7.91752836e-04,  1.70733274e-02,
           -2.73025573e-03, -2.26305080e-02, -8.96305492e-03, -3.60634540e-04,
           -8.49617743e-03, -9.80700990e-03, -2.24852279e-04,  4.09108102e-03,
            2.05637292e-03,  9.24051832e-03,  2.42592699e-02, -3.64164001e-03,
            1.68472969e-02,  2.54693542e-02,  5.05075770e-03,  5.16173513e-03,
           -4.83468454e-02, -1.21900774e-02, -9.11780503e-03, -5.05512123e-03,
           -7.41232540e-03,  3.91398550e-03,  3.06965071e-03, -5.90932528e-03,
            1.12445860e-03,  4.66719907e-03,  7.76292483e-03, -4.68010209e-04,
            4.57085333e-03,  6.98203998e-02,  2.23256968e-01,  6.03967088e-02,
            5.41465239e-02,  6.50298816e-03,  1.54786976e-03, -6.12458026e-03,
           -1.21478016e-02, -2.36115428e-02, -1.44891907e-02,  1.00069594e-02,
            1.28721599e-02, -2.86915970e-03, -2.29801640e-02, -1.27419711e-02,
           -1.79027671e-02, -2.41362676e-02, -7.78849976e-03,  1.25177575e-02,
            1.22658786e-02,  2.01564609e-02,  6.24882932e-03, -2.94930421e-03,
            2.19246690e-02, -1.21455484e-02, -3.64289500e-03, -2.47392662e-03,
           -5.05322234e-03, -7.04776467e-03,  2.60680790e-03,  1.63317314e-03,
            1.57057505e-03, -6.94289571e-03, -9.13210988e-03, -5.14828300e-03,
           -2.05609675e-03, -1.55565554e-02, -1.56562532e-03, -6.52763569e-03,
           -3.96162744e-03, -5.70459097e-04,  7.76754226e-04, -1.51831525e-03,
            9.68083322e-04,  6.81558268e-04, -1.87228840e-03, -2.19265410e-03,
           -6.51541156e-01, -1.35251471e-02])), (0.07848642552981931, array([ 1.84188359e-01, -3.05870171e-01,  1.49198471e-01,  7.87807951e-02,
            1.22599538e-01, -4.03112767e-02,  3.34042236e-02, -2.53340913e-01,
           -7.77881450e-01,  4.92023239e-03,  3.09791110e-02,  5.48255151e-03,
           -2.82090547e-02, -1.27406253e-02,  1.15325868e-03,  4.43431341e-03,
            6.29308236e-03,  1.45105664e-03, -1.82329674e-03, -7.60767616e-04,
           -8.41740602e-04,  5.47909104e-03,  9.09192366e-03, -1.39850991e-03,
            2.32559995e-02,  2.59952546e-02,  2.72665779e-02, -6.95731563e-03,
           -3.79661683e-02, -3.82250001e-03, -7.30620065e-03, -8.43124545e-03,
           -2.49057213e-02, -1.13018454e-02, -8.81137961e-03, -7.00195233e-03,
           -1.75154527e-02, -4.03319983e-03, -6.31246022e-03, -3.38662606e-03,
           -1.90484877e-03,  8.86981906e-02,  2.13060832e-01,  6.32707375e-02,
            2.35421226e-01,  2.89764159e-02,  1.67968346e-03, -3.84168000e-03,
           -3.03666587e-03, -1.83109292e-03, -8.57071972e-04, -4.31219406e-03,
           -6.20497375e-03,  6.97048660e-03,  1.14633859e-02, -2.87600954e-04,
           -3.78304190e-03, -1.95330813e-02, -5.22629574e-03,  2.04417140e-02,
            4.52053030e-03, -7.15957141e-03, -2.63864754e-03,  2.18279169e-03,
            2.14227840e-05, -5.32141382e-03, -5.85467921e-03, -1.59644921e-02,
            2.70014938e-03, -1.34069282e-02, -1.25762248e-02,  2.80874041e-03,
           -3.06062185e-02,  3.51151122e-04, -6.19744668e-03, -3.97887545e-02,
           -2.64247400e-02, -3.69463581e-02,  7.20491728e-03,  3.56915543e-03,
           -6.49136832e-03, -2.98564240e-02, -3.13811801e-02, -1.69712136e-02,
           -3.22039697e-02, -2.11954676e-02, -2.95472599e-03,  3.49282285e-04,
            1.61455654e-01,  1.24901710e-02])), (0.051763400830042, array([-1.88697768e-02,  1.99153916e-02, -1.50992854e-02, -8.39538161e-03,
           -6.13134826e-02,  1.27543697e-02, -3.74091591e-03,  1.51760514e-02,
            6.56315244e-02,  1.01737069e-03,  8.89451723e-03, -1.99027424e-03,
           -5.68317612e-03, -2.06171266e-04, -1.40521699e-03, -1.39625680e-03,
            7.59116202e-04, -5.96010680e-04,  4.60939872e-04,  1.45203641e-04,
           -3.86390842e-03, -3.24537563e-04,  3.86384430e-04,  7.11164706e-04,
           -2.66644623e-03, -3.15822443e-03, -2.77303177e-03,  1.65647745e-03,
           -2.20855060e-03,  1.16677604e-03, -1.06345034e-03,  4.03579179e-03,
            6.48850641e-03,  3.26805997e-03,  2.17878795e-03,  1.10405224e-03,
            1.41902792e-03, -1.79237047e-05, -2.92586910e-03, -4.04844594e-03,
            6.73236709e-04, -8.25972214e-03, -1.64017263e-02, -4.90970826e-03,
           -2.56431608e-03, -2.54648914e-04,  1.61465231e-03,  3.32298891e-03,
            3.68933548e-04,  9.76030252e-05, -3.00978027e-04,  2.25115748e-03,
            8.40532033e-03, -6.25251233e-03, -1.82606624e-03, -1.01219485e-03,
           -1.22428377e-02, -1.11400675e-02, -5.78214933e-03,  8.73957810e-03,
            4.98077086e-03,  9.74562202e-03,  4.18968632e-03, -2.00213912e-04,
            1.00137419e-02, -2.75230220e-01, -5.60595261e-02, -1.12764402e-01,
           -5.78889079e-02, -9.77460257e-02, -1.01178443e-01, -4.26695112e-02,
           -2.24173542e-01, -1.99158429e-01, -1.34496154e-01, -2.61734132e-01,
           -2.48290120e-01, -1.79693938e-01, -4.12659042e-02, -1.21238110e-01,
           -9.89963288e-02, -2.26076602e-01, -3.01991152e-01, -2.02651620e-01,
           -5.99178341e-01, -1.34591980e-01, -4.30616081e-02, -1.15515162e-01,
            9.39791440e-03, -7.74882426e-03])), (0.0022475672476950743, array([ 8.73743164e-04, -1.66423040e-03,  7.45416458e-04,  3.07977559e-04,
            1.72861907e-03, -2.04540557e-03,  6.82206778e-04, -1.40305294e-03,
           -2.72609621e-03, -1.42354356e-03,  7.09013571e-05, -2.25955695e-05,
            7.30409774e-05,  2.09513937e-04, -6.28938505e-04,  8.33854395e-05,
            2.71834078e-05,  2.73636459e-05,  2.77647997e-05,  4.63309898e-06,
            2.73807988e-04,  1.05254109e-03, -2.15173325e-04,  1.10141058e-04,
           -3.63394974e-04,  1.89895596e-04,  2.13911999e-04, -2.12449614e-04,
           -1.48090129e-04,  3.83683190e-05,  3.95015250e-05, -6.40889697e-05,
           -2.15752268e-04,  3.13663774e-05,  5.52593117e-05, -1.68659188e-06,
            1.24672096e-04, -3.22821095e-05,  4.53533751e-06,  1.52531961e-05,
            2.96810024e-05,  2.66726327e-04, -3.15592018e-04,  1.24963156e-04,
            2.91945427e-04,  3.44997444e-05, -4.49126982e-04, -3.91767013e-04,
            3.18626012e-04,  2.06652903e-04, -1.15869167e-03, -1.19228002e-01,
           -6.73506700e-01, -6.19958284e-01, -3.84399876e-01, -2.63273327e-05,
           -4.93419905e-03, -1.18597689e-03,  2.38658754e-04,  1.05118521e-03,
            4.73676341e-04,  4.94709946e-04,  3.05776306e-04,  6.32243095e-05,
            7.23131883e-04, -4.31812813e-04,  2.56914039e-04, -1.10376292e-04,
            7.39428235e-05, -4.68659075e-05,  1.93069842e-05,  7.41033672e-05,
           -3.28099516e-04, -2.30607646e-04, -7.37500178e-04, -2.47522772e-04,
           -6.48623696e-04, -2.00731100e-04,  6.41467503e-05,  5.37530518e-05,
           -7.53452293e-05, -3.79687489e-04,  1.58121247e-04, -1.08978707e-04,
           -1.66192458e-03,  4.65935481e-05,  1.81179430e-04, -1.10096449e-04,
            1.35285991e-03,  3.28324358e-04])), (0.0009933514215786233, array([ 1.53628210e-04, -6.22361736e-04, -4.57054309e-06,  2.90935816e-04,
           -2.10106863e-04,  1.00204884e-04,  2.17499715e-04, -5.57897504e-04,
           -1.39568550e-03,  1.05339167e-01,  3.79239031e-01,  5.45831880e-01,
            5.02591619e-01,  2.61126125e-01,  1.08363182e-01,  3.79714888e-02,
            2.14380574e-02,  1.61903934e-02,  1.14422809e-02,  8.09929611e-03,
            7.40576108e-03,  2.66280569e-02,  1.98802392e-01,  1.10899671e-02,
            1.30437900e-01,  1.77213342e-01,  1.47040349e-01,  1.47554793e-01,
            2.16513272e-01,  1.10090824e-01,  8.76644327e-02,  7.05251496e-02,
            7.71524303e-02,  3.13236194e-02,  2.52734757e-02,  1.43036488e-02,
            2.08533768e-02,  5.21462344e-03,  7.36003888e-03,  6.38283136e-03,
            3.69168397e-03,  1.23179683e-05,  1.01732980e-03,  2.06129745e-04,
            1.45626634e-03,  4.51948859e-03,  1.02505841e-05,  1.06353364e-06,
            7.59423127e-04,  4.73252908e-05,  3.53150845e-05,  5.34980746e-04,
           -9.21386327e-05, -1.81679508e-04, -1.39970011e-04,  1.93297011e-03,
            5.53199015e-05,  1.82548204e-04,  7.77840325e-04,  4.21404278e-04,
            2.32995847e-05,  3.22842741e-05, -9.13394568e-06,  6.90184011e-07,
            3.49610347e-05, -4.14575111e-04, -1.47745097e-04, -1.84970570e-04,
           -2.31609979e-04, -1.85588148e-04, -1.81753881e-04, -1.27181386e-04,
           -3.94305437e-04, -3.18831514e-04, -2.19945839e-04,  7.05051980e-05,
           -4.52457928e-04, -2.64528565e-05, -4.66528170e-05, -1.51408972e-04,
           -2.01521049e-04, -3.48861044e-04, -5.67246400e-04, -2.76629167e-04,
           -1.07199259e-03,  9.37205376e-04, -6.20598211e-05, -1.53413905e-04,
           -1.32020174e-04, -1.10134760e-04])), (0.0001285036483623192, array([-1.28878123e-04,  2.02334342e-05, -1.98617012e-04,  1.09363369e-04,
            2.14764133e-04,  2.23521491e-04, -1.43039095e-04, -5.62481330e-06,
           -2.80984771e-04,  5.46493983e-02,  1.96852465e-01,  2.83402156e-01,
            2.60788652e-01,  1.35444226e-01,  5.62126533e-02,  1.96921722e-02,
            1.11068787e-02,  8.39898897e-03,  5.94223765e-03,  4.20047060e-03,
           -1.42471248e-02, -5.25585800e-02, -3.82741827e-01, -2.13698723e-02,
           -2.50747229e-01, -3.41163519e-01, -2.83196435e-01, -2.84414624e-01,
           -4.17342376e-01, -2.12227247e-01, -1.69255898e-01, -1.36025496e-01,
           -1.48792418e-01, -6.03392197e-02, -4.87875548e-02, -2.75823143e-02,
           -4.02682463e-02, -1.00689806e-02, -1.42375864e-02, -1.23355135e-02,
           -7.12337590e-03,  9.68627459e-05,  1.52790204e-04,  4.97796947e-05,
            1.16390050e-04, -1.09305830e-03,  2.32283741e-05, -3.69245654e-05,
           -2.08640398e-04, -2.08468889e-05, -2.73325647e-05,  3.09570464e-04,
            1.12712030e-05,  6.72111137e-06,  4.97753122e-05,  1.44149409e-03,
            1.38348556e-03,  3.95272055e-03,  6.47066837e-03,  5.84691125e-03,
            1.93258877e-03,  1.09953636e-03,  4.14830539e-04,  7.20333669e-05,
            2.33433608e-03,  1.50046037e-04, -8.99742286e-07,  5.09254547e-05,
           -3.69858670e-05,  3.66282565e-05,  7.63683400e-05, -7.89082876e-06,
            1.09914137e-04,  7.57720383e-05,  6.72733706e-05,  1.36273518e-04,
            1.23933000e-04,  2.45637766e-04,  2.42022541e-05,  6.39095337e-05,
            4.85970167e-05,  1.10251470e-04,  1.34102319e-04,  8.35564438e-05,
            4.47564514e-04,  1.36209850e-04,  5.51574641e-05,  6.13542319e-05,
           -3.51101653e-05,  5.54513489e-05])), (8.546833259599872e-05, array([-2.08332763e-04,  2.31838890e-04, -2.32036055e-04,  2.13627016e-05,
           -3.21972859e-04, -3.28785753e-05, -1.44691509e-05,  1.78841317e-04,
            3.08667982e-04, -1.27699843e-04, -2.38053889e-03, -3.47639669e-03,
           -3.15684072e-03, -1.61671008e-03, -6.72242549e-04, -2.34562134e-04,
           -1.34216500e-04, -1.01989746e-04, -6.92548070e-05, -4.77069908e-05,
            1.28954766e-04,  1.82467528e-03,  3.69931946e-03,  1.86164953e-04,
            2.40342441e-03,  3.29844614e-03,  2.71115400e-03,  2.74510258e-03,
            4.07991636e-03,  2.06063416e-03,  1.66397221e-03,  1.34137927e-03,
            1.48732389e-03,  6.03148471e-04,  4.85348750e-04,  2.75466231e-04,
            4.07980815e-04,  1.01221155e-04,  1.47168736e-04,  1.25627378e-04,
            7.56742638e-05, -5.29424996e-05,  8.91831971e-05, -8.22435331e-07,
            3.13514097e-05,  1.22806426e-05, -7.69374333e-05, -8.32139535e-06,
           -5.22312387e-05, -3.01101714e-05, -7.96754176e-06, -2.62440143e-05,
           -3.81164553e-05, -8.87918641e-05,  1.50166487e-04,  4.25264035e-02,
            1.30542379e-01,  3.84450432e-01,  6.32638568e-01,  5.75707304e-01,
            1.90209340e-01,  1.06746617e-01,  3.98898116e-02,  6.44077334e-03,
            2.28826068e-01, -5.74563925e-05, -5.80535539e-06, -5.59460932e-06,
           -8.47758206e-05, -1.59189210e-05,  7.39386200e-04, -1.86751752e-05,
           -2.45961895e-06, -3.30313631e-07, -2.55624742e-05,  8.58763777e-06,
           -2.34121617e-05,  3.01494013e-05, -1.62412636e-05, -3.43000551e-05,
           -5.75614077e-06,  7.25723622e-06,  1.39889108e-05, -1.43744435e-05,
           -1.70305297e-04,  3.63750540e-05, -6.76543807e-05, -3.14403423e-05,
           -9.10703227e-05, -8.62663594e-05])), (3.57406919508468e-15, array([ 2.11575486e-02, -1.58290018e-02, -2.31031029e-02, -1.32339463e-02,
           -1.89671464e-15, -4.50856253e-16, -2.47485440e-16,  1.67721240e-02,
           -6.87602672e-18,  1.66117623e-13,  5.83720984e-13,  8.40294151e-13,
            7.73710867e-13,  4.02040948e-13,  1.66488434e-13,  5.83168521e-14,
            3.32222490e-14,  2.49726506e-14,  1.75715810e-14,  1.24680774e-14,
           -3.05980931e-14, -1.02417681e-13, -8.22521604e-13, -4.60839231e-14,
           -5.39098389e-13, -7.33497436e-13, -6.08832099e-13, -6.11607489e-13,
           -8.97321584e-13, -4.56141407e-13, -3.63870351e-13, -2.92295232e-13,
           -3.19878831e-13, -1.29643197e-13, -1.05080017e-13, -5.93866341e-14,
           -8.60871852e-14, -2.14069617e-14, -3.08313969e-14, -2.67523519e-14,
           -1.55999860e-14, -3.68177357e-16,  9.83300189e-16, -2.88448726e-17,
            1.17248397e-15, -8.87031146e-16, -5.82693334e-16, -1.15822815e-16,
           -7.13689134e-16, -2.60166268e-16, -2.90415081e-16, -6.04816455e-15,
           -3.87908532e-14, -3.59788680e-14, -2.04042558e-14,  3.36160503e-13,
            1.02335089e-12,  3.01489393e-12,  4.96135057e-12,  4.51465170e-12,
            6.15185598e-01,  3.45170989e-01,  1.28946998e-01,  2.08244423e-02,
           -6.95452578e-01, -1.17725314e-15, -2.14993045e-16, -1.88381812e-16,
           -1.07716607e-15, -1.92010054e-16,  5.71067215e-15, -1.84127565e-16,
           -6.35501344e-16, -4.92896282e-16, -3.09190140e-16, -2.45488394e-16,
           -9.10601070e-16,  1.07212288e-16, -2.11995340e-16, -5.21567041e-16,
           -4.83790506e-16, -7.36752799e-16, -9.05612969e-16, -6.78891034e-16,
           -3.07509446e-15,  7.67017500e-16, -3.25664494e-16, -3.37512170e-16,
            4.22647841e-16, -1.53183588e-16])), (-5.203296557290081e-16, array([ 3.95776120e-01,  5.95350761e-01, -2.60441848e-01, -1.49186602e-01,
            1.92539649e-15, -7.15780497e-16,  1.14891672e-15, -6.30822900e-01,
           -3.29257908e-15,  4.67313577e-14,  1.76804996e-13,  2.53917221e-13,
            2.33699003e-13,  1.21103520e-13,  5.04820000e-14,  1.73684194e-14,
            1.00907288e-14,  7.40941579e-15,  5.25033951e-15,  3.63889926e-15,
           -1.92335010e-14, -7.64211538e-14, -5.14525154e-13, -2.85113552e-14,
           -3.36988883e-13, -4.58831878e-13, -3.80916894e-13, -3.82484085e-13,
           -5.61969388e-13, -2.85650684e-13, -2.27840712e-13, -1.83258941e-13,
           -2.00699534e-13, -8.12421295e-14, -6.55676014e-14, -3.71603918e-14,
           -5.46295728e-14, -1.35999504e-14, -1.93399333e-14, -1.67395955e-14,
           -9.51656615e-15,  4.15643054e-16,  2.39123077e-16, -2.40023495e-16,
            4.57368674e-16, -2.19071205e-15,  8.04654859e-16,  1.91213598e-16,
            2.23220914e-16,  2.44529910e-16,  4.61959783e-17,  6.51377538e-15,
            3.49687157e-14,  3.24735983e-14,  1.92711779e-14, -1.91410281e-13,
           -5.90311992e-13, -1.74015128e-12, -2.86353645e-12, -2.60610537e-12,
           -1.84157089e-02, -1.03327654e-02, -3.86005521e-03, -6.23384014e-04,
            2.08185177e-02,  5.82569889e-16, -1.27445237e-17,  1.40344000e-17,
            3.79983681e-16,  7.56614939e-17, -3.43738723e-15,  8.31006203e-17,
            8.20325305e-17,  1.13494052e-16,  2.93170621e-16, -1.51020000e-16,
            3.11752960e-16,  1.58851686e-16,  1.59089842e-16,  4.46618813e-17,
            6.60833994e-17, -8.21858778e-17, -1.70396109e-16, -1.16447521e-16,
            1.30103395e-15, -3.37879073e-16,  1.11958831e-16,  2.80088150e-16,
            2.49661075e-15,  3.14516975e-16])), (-1.832295599444726e-15, array([-3.66771361e-01,  5.30133370e-01,  4.49762306e-01,  2.57633367e-01,
           -1.00010513e-15,  1.36101936e-15, -5.96497519e-16, -5.61719731e-01,
            2.96800585e-15,  2.21860401e-13,  8.01561027e-13,  1.15431277e-12,
            1.06242717e-12,  5.51661224e-13,  2.28765043e-13,  7.99484576e-14,
            4.53604094e-14,  3.40503053e-14,  2.42471615e-14,  1.70529159e-14,
           -5.43693189e-14, -2.02956039e-13, -1.45906397e-12, -8.12234427e-14,
           -9.55450657e-13, -1.30039646e-12, -1.07947378e-12, -1.08345113e-12,
           -1.59000274e-12, -8.08602402e-13, -6.44860817e-13, -5.18191098e-13,
           -5.66762590e-13, -2.29591484e-13, -1.85566519e-13, -1.05031958e-13,
           -1.53445492e-13, -3.83317479e-14, -5.41923598e-14, -4.70300061e-14,
           -2.70197749e-14, -8.38247062e-16, -2.00960536e-15, -6.14130512e-16,
           -1.24468044e-15, -4.01922963e-15,  8.54756834e-16,  6.71081649e-17,
           -5.35323451e-16,  1.09902452e-16, -4.81268135e-16,  4.21930429e-15,
            1.59689958e-14,  1.45423480e-14,  8.54643474e-15, -6.70301174e-14,
           -2.17659478e-13, -6.42007685e-13, -1.05641968e-12, -9.61851422e-13,
           -9.68124537e-03, -5.43199492e-03, -2.02925349e-03, -3.27716605e-04,
            1.09444159e-02, -2.06603474e-16, -1.48153166e-16, -2.55972266e-16,
           -1.98805873e-16, -1.06059060e-16, -1.46806749e-15, -1.30672293e-16,
           -2.75698019e-16, -4.80575050e-16, -2.11066013e-16, -3.44234921e-16,
           -2.54309196e-16,  5.75283223e-16, -8.71629386e-17, -6.15334883e-16,
           -1.03644619e-16, -5.13019067e-16, -6.55711395e-16, -8.17593232e-16,
           -8.48519545e-17,  8.21246329e-17,  1.26381210e-16, -1.13566604e-16,
           -1.49054402e-15,  7.83773982e-17]))]
    Eigenvalues in descending order: 
    [6.400301029851477, 4.23053271502051, 3.0220056962108997, 2.3606995503606907, 1.7227802802980245, 1.7053304684180373, 1.5823948341725016, 1.5166979298471606, 1.4840050960170488, 1.3921255385114626, 1.338123871891159, 1.2745588269563284, 1.2235863310684403, 1.2178118832338365, 1.1943992057853123, 1.1835468203512876, 1.175215028494778, 1.1607311253642982, 1.148470393300462, 1.1194893763589782, 1.1096027631760017, 1.1049386135422454, 1.0914342813543214, 1.0846048546102214, 1.0827345320601791, 1.0711889349293784, 1.0636889264741323, 1.054552899346843, 1.0495564454254236, 1.048150719936755, 1.0416363339168457, 1.0381369601703594, 1.0334519547258465, 1.027681645874375, 1.0238189302225627, 1.015216974820168, 1.0136313723660826, 1.0118982712943438, 1.0092805032957575, 1.0062251615105764, 1.0055267813795965, 1.0041959740870794, 1.0019590929564808, 1.0005162465340969, 0.9936805751250474, 0.9932619924220245, 0.989390243013148, 0.9869860806882016, 0.9828875619572957, 0.9812541978117575, 0.9698086941074693, 0.9529273395960557, 0.949801385112714, 0.9442733887767248, 0.9416943150113966, 0.9337938560734337, 0.931041145818125, 0.9160124331703235, 0.9017616031502895, 0.890644523647056, 0.8885676555317741, 0.8791588942184467, 0.8629753367296573, 0.8317611004894274, 0.8087658281973401, 0.7892722212229611, 0.7619517528659735, 0.754916612665571, 0.7309008552453276, 0.6902086442370304, 0.6604534042250566, 0.6482152267621847, 0.5714116669727295, 0.42504944989687043, 0.3978166894517955, 0.35238333407115874, 0.337764060513702, 0.24537240875649707, 0.2105885520158932, 0.1947049465927752, 0.12332392942132495, 0.07848642552981931, 0.051763400830042, 0.0022475672476950743, 0.0009933514215786233, 0.0001285036483623192, 8.546833259599872e-05, 3.57406919508468e-15, -5.203296557290081e-16, -1.832295599444726e-15]
    
    In [174]:
    tot = sum(eigenvalues)
    var_explained = [(i / tot) for i in sorted(eigenvalues, reverse=True)]  
    
    # an array of variance explained by each 
    
    # eigen vector... there will be 90 entries as there are 90 eigen vectors)
    cum_var_exp = np.cumsum(var_explained)  
    # an array of cumulative variance. There will be 90 entries with 90 th entry cumulative reaching almost 100%
    
    In [175]:
    print(len(var_explained))
    
    print((cum_var_exp))
    
    90
    [0.07111057 0.11811392 0.15168992 0.17791848 0.19705944 0.21600652
     0.23358772 0.250439   0.26692704 0.28239426 0.29726149 0.31142248
     0.32501714 0.33854764 0.35181802 0.36496782 0.37802505 0.39092136
     0.40368144 0.41611953 0.42844778 0.4407242  0.45285059 0.46490109
     0.47693082 0.48883227 0.50065039 0.512367   0.5240281  0.53567358
     0.54724669 0.55878091 0.57026308 0.58168114 0.59305629 0.60433586
     0.61559781 0.62684051 0.63805413 0.6492338  0.66040571 0.67156283
     0.6826951  0.69381134 0.70485163 0.71588727 0.72687989 0.73784581
     0.74876618 0.75966841 0.77044347 0.78103098 0.79158375 0.8020751
     0.8125378  0.82291272 0.83325705 0.84343441 0.85345344 0.86334895
     0.87322138 0.88298928 0.89257737 0.90181865 0.91080445 0.91957366
     0.92803933 0.93642683 0.94454751 0.95221608 0.95955405 0.96675604
     0.97310471 0.97782723 0.98224717 0.98616233 0.98991506 0.99264127
     0.99498101 0.99714428 0.99851447 0.9993865  0.99996161 0.99998659
     0.99999762 0.99999905 1.         1.         1.         1.        ]
    

    From above table we conclude that 96% variance is contributed by about 72 features

    In [176]:
    plt.figure(figsize=(plotSizeX, plotSizeY))
    plt.bar(range(0,90), np.array(var_explained), alpha = 0.5, align='center', label='individual explained variance')
    plt.step(range(0,90), np.array(cum_var_exp), where= 'mid', label='cumulative explained variance')
    plt.ylabel('Explained variance ratio')
    plt.xlabel('Principal components')
    plt.legend(loc = 'best')
    plt.show()
    
    72 dimensions covering 97% variance in the data. So we can reduce to 72 dimension space

    Now will recall the ensemble models from our initial run to check the feature selection using featureimp from individual models

    In [177]:
    #Building fuction to return the feature importances for the model
    predictors = [x for x in dff.columns if x not in ['price']]
    
    def modelfit(alg, dxtrain, dytrain, printFeatureImportance=True):
        #feature importance
        alg.fit(dxtrain,dytrain)
        alg_imp_feature_1=pd.DataFrame(alg.feature_importances_, columns = ["Imp"], index = predictors)
        alg_imp_feature_1.sort_values(by="Imp",ascending=False)
        alg_imp_feature_1['Imp'] = alg_imp_feature_1['Imp'].map('{0:.5f}'.format)
        alg_imp_feature_1=alg_imp_feature_1.sort_values(by="Imp",ascending=False)
        alg_imp_feature_1.Imp=alg_imp_feature_1.Imp.astype("float")
        
        feat_30list=list(alg_imp_feature_1.index[:30])
        
    
        
        if printFeatureImportance:
            alg_imp_feature_1[:30].plot.bar(figsize=(plotSizeX, plotSizeY))
            #First 20 features have an importance of 90.5% and first 30 have importance of 95.15
            print("First 25 feature importance:\t",(alg_imp_feature_1[:25].sum())*100)
            print("First 30 feature importance:\t",(alg_imp_feature_1[:30].sum())*100)
            
        return feat_30list
    

    Will run above function with ensemble models: Gradient boosting, Random forest, Bagging

    In [178]:
    #Gradient boost model
    modelfit(GB1,X_train,y_train)
    
    First 25 feature importance:	 Imp    96.698
    dtype: float64
    First 30 feature importance:	 Imp    98.305
    dtype: float64
    
    Out[178]:
    ['furnished_1',
     'living_measure',
     'yr_built',
     'living_measure15',
     'quality_8',
     'City_Bellevue',
     'City_Seattle',
     'lot_measure15',
     'HouseLandRatio',
     'City_Kent',
     'quality_9',
     'sight_4',
     'City_Federal Way',
     'coast_1',
     'City_Mercer Island',
     'City_Kirkland',
     'City_Medina',
     'City_Redmond',
     'quality_11',
     'ceil_measure',
     'quality_7',
     'City_Renton',
     'City_Maple Valley',
     'quality_6',
     'total_area',
     'quality_10',
     'basement',
     'City_Issaquah',
     'City_Sammamish',
     'condition_5']

    The top 30 features are covering about 98% in gradient boosting model. This is very good coverage for just 30% of the variables

    In [179]:
    #Random Forest model
    modelfit(RF1,X_train,y_train)
    
    First 25 feature importance:	 Imp    93.273
    dtype: float64
    First 30 feature importance:	 Imp    95.008
    dtype: float64
    
    Out[179]:
    ['furnished_1',
     'yr_built',
     'living_measure',
     'living_measure15',
     'quality_8',
     'HouseLandRatio',
     'lot_measure15',
     'quality_9',
     'ceil_measure',
     'City_Bellevue',
     'total_area',
     'lot_measure',
     'City_Seattle',
     'City_Kirkland',
     'City_Kent',
     'City_Federal Way',
     'coast_1',
     'basement',
     'City_Mercer Island',
     'quality_7',
     'City_Redmond',
     'sight_4',
     'City_Renton',
     'City_Maple Valley',
     'City_Medina',
     'City_Sammamish',
     'quality_10',
     'has_renovated_Yes',
     'room_bath_2.5',
     'room_bed_3']

    The top 30 features are covering about 95% in random forest model

    Now will extract the top 30 features from the above models

    In [180]:
    feat_list_GB1=modelfit(GB1,X_train,y_train, printFeatureImportance=False)
    print(feat_list_GB1)
    
    feat_list_RF1=modelfit(RF1,X_train,y_train, printFeatureImportance=False)
    print(feat_list_RF1)
    
    ['furnished_1', 'living_measure', 'yr_built', 'living_measure15', 'quality_8', 'City_Bellevue', 'City_Seattle', 'lot_measure15', 'HouseLandRatio', 'City_Kent', 'quality_9', 'sight_4', 'City_Federal Way', 'coast_1', 'City_Mercer Island', 'City_Kirkland', 'City_Medina', 'City_Redmond', 'quality_11', 'ceil_measure', 'quality_7', 'City_Renton', 'City_Maple Valley', 'quality_6', 'total_area', 'quality_10', 'basement', 'City_Issaquah', 'City_Sammamish', 'condition_5']
    ['furnished_1', 'yr_built', 'living_measure', 'living_measure15', 'quality_8', 'HouseLandRatio', 'lot_measure15', 'quality_9', 'ceil_measure', 'City_Bellevue', 'total_area', 'lot_measure', 'basement', 'City_Kent', 'City_Kirkland', 'City_Federal Way', 'City_Seattle', 'quality_7', 'City_Mercer Island', 'City_Redmond', 'coast_1', 'City_Renton', 'sight_4', 'quality_10', 'City_Maple Valley', 'City_Sammamish', 'City_Medina', 'room_bed_4', 'condition_3', 'City_Issaquah']
    

    From the above 2 feature list, we will consolidate all the features

    In [181]:
    Key_feat=list(set(feat_list_GB1).union(feat_list_RF1))
    print(len(Key_feat))
    print(Key_feat)
    
    33
    ['City_Mercer Island', 'condition_5', 'City_Sammamish', 'yr_built', 'sight_4', 'City_Seattle', 'City_Federal Way', 'City_Maple Valley', 'City_Bellevue', 'furnished_1', 'City_Kent', 'quality_9', 'City_Redmond', 'City_Issaquah', 'quality_8', 'total_area', 'quality_7', 'ceil_measure', 'City_Medina', 'coast_1', 'condition_3', 'lot_measure15', 'HouseLandRatio', 'City_Kirkland', 'City_Renton', 'living_measure15', 'basement', 'room_bed_4', 'quality_6', 'lot_measure', 'quality_10', 'quality_11', 'living_measure']
    

    From two models we have 33 importance features. We will freeze on the above 33 list and make another dataframe (along with 'price')

    In [182]:
    dff33=dff[['price','basement', 'City_Bellevue', 'coast_1', 'HouseLandRatio', 'City_Seattle', 'quality_10', 'quality_9', 'ceil_measure', 'City_Renton', 'City_Redmond', 'City_Federal Way', 'City_Mercer Island', 'yr_built', 'living_measure15', 'living_measure', 'City_Maple Valley', 'sight_3', 'total_area', 'City_Kirkland', 'sight_4', 'quality_6', 'quality_7', 'City_Sammamish', 'quality_8', 'City_Kent', 'quality_12', 'lot_measure', 'condition_3', 'furnished_1', 'City_Issaquah', 'quality_11', 'City_Medina', 'lot_measure15']].copy()
    
    In [183]:
    dff33.shape
    
    Out[183]:
    (18287, 34)
    In [184]:
    dff33.head()
    
    Out[184]:
    price basement City_Bellevue coast_1 HouseLandRatio City_Seattle quality_10 quality_9 ceil_measure City_Renton ... quality_8 City_Kent quality_12 lot_measure condition_3 furnished_1 City_Issaquah quality_11 City_Medina lot_measure15
    17786 430000 0 0 0 19.0 1 0 0 2550 0 ... 1 0 0 11160 1 0 0 0 0 7440
    3782 385500 420 0 0 16.0 0 0 0 1120 0 ... 0 0 0 7947 1 0 0 0 0 7950
    10069 736000 0 1 0 16.0 0 0 1 2290 0 ... 0 0 0 12047 0 1 0 0 0 15666
    7114 580000 970 0 0 24.0 1 0 0 970 0 ... 0 0 0 6000 0 0 0 0 0 6000
    10080 315000 1160 0 0 22.0 1 0 0 1160 0 ... 0 0 0 8100 0 0 0 0 0 7271

    5 rows × 34 columns

    In [185]:
    X3 = dff33.drop("price" , axis=1)
    y3 = dff33["price"]
    
    X3_train, X3_test, y3_train, y3_test = train_test_split(X3, y3, test_size=0.2, random_state=10)
    X3_train, X3_val, y3_train, y3_val = train_test_split(X3_train, y3_train, test_size=0.2, random_state=10)
    
    print(X3_train.shape)
    print(X3_test.shape)
    print(X3_val.shape)
    
    (11703, 33)
    (3658, 33)
    (2926, 33)
    

    Eventhough PCA is helping us to reduce dimensions upto about 60 dimensions, we can see that in our random forest model top 30 features are explaining the 95% variance in the regression and in gradient boosting model top 30 features are covering 98% varience.

    Hence we conclude that we will use features selection by considering the feature importance fucntion in individual models. Thus we extracted 33 important features

    HYPERTUNING with Gridsearch CV

    In [186]:
    from sklearn.model_selection import GridSearchCV
    from sklearn.metrics import accuracy_score
    from sklearn.metrics import roc_auc_score
    from sklearn.model_selection import cross_val_score
    

    Since we have better performance in gradient boosting model, we will hypertune the model for improving the score

    Following are the parameters we tune for the gradient boosting model.

    In [187]:
    param_grid = {
        'loss':['ls','lad','huber'],
        'bootstrap': ['True','False'],
        'max_depth': range(5,11,1),
        'max_features': ['auto','sqrt'],
        'learning_rate': [0.05,0.1,0.2,0.25],
        'min_samples_leaf': [4,10,20],
        'min_samples_split': [5,10,1000],
        'n_estimators': [10,50,100,150,200],
        'subsample':[0.8,1]
    }
    
    In [188]:
    GBR_test=GradientBoostingRegressor(random_state=22)
    

    First will tune each parameter separately

    In [189]:
    param_grid1 = {'n_estimators': range(50,401,50)}
    
    In [190]:
    grid_search1 = GridSearchCV(estimator = GBR_test, param_grid = param_grid1, 
                              cv = 3, n_jobs = 2, verbose = 1)
    
    In [191]:
    grid_search1.fit(X_train,y_train)
    grid_search1.best_params_
    
    Fitting 3 folds for each of 8 candidates, totalling 24 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  24 out of  24 | elapsed:   57.6s finished
    
    Out[191]:
    {'n_estimators': 400}
    In [192]:
    grid_search1.best_params_, grid_search1.best_score_
    
    Out[192]:
    ({'n_estimators': 400}, 0.7757647547223905)

    n_estimators of 400 is best in range 50 to 400. Will test same until 1000

    In [193]:
    param_grid2 = {'n_estimators': range(400,1001,200)}
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_search2 = GridSearchCV(estimator = GBR_test, param_grid = param_grid2, 
                              cv = 3, n_jobs = 2, verbose = 1)
    grid_search2.fit(X_train,y_train)
    
    Fitting 3 folds for each of 4 candidates, totalling 12 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  12 out of  12 | elapsed:  1.3min finished
    
    Out[193]:
    GridSearchCV(cv=3, error_score='raise-deprecating',
           estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                 learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
                 max_leaf_nodes=None, min_impurity_decrease=0.0,
                 min_impurity_split=None, min_samples_leaf=1,
                 min_sampl...te=22, subsample=1.0, tol=0.0001,
                 validation_fraction=0.1, verbose=0, warm_start=False),
           fit_params=None, iid='warn', n_jobs=2,
           param_grid={'n_estimators': range(400, 1001, 200)},
           pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
           scoring=None, verbose=1)
    In [194]:
    grid_search2.cv_results_,grid_search2.best_params_, grid_search2.best_score_
    
    Out[194]:
    ({'mean_fit_time': array([ 7.2032059 , 10.84747616, 14.41415413, 17.8464543 ]),
      'std_fit_time': array([0.0866392 , 0.19536189, 0.14661922, 0.92177025]),
      'mean_score_time': array([0.03063202, 0.04291979, 0.06155936, 0.07675632]),
      'std_score_time': array([0.00097431, 0.00340029, 0.00733648, 0.01039824]),
      'param_n_estimators': masked_array(data=[400, 600, 800, 1000],
                   mask=[False, False, False, False],
             fill_value='?',
                  dtype=object),
      'params': [{'n_estimators': 400},
       {'n_estimators': 600},
       {'n_estimators': 800},
       {'n_estimators': 1000}],
      'split0_test_score': array([0.77559185, 0.77864467, 0.77983937, 0.78052058]),
      'split1_test_score': array([0.76537408, 0.77109939, 0.77235457, 0.7724209 ]),
      'split2_test_score': array([0.78632834, 0.78828157, 0.78829273, 0.78811941]),
      'mean_test_score': array([0.77576475, 0.77934188, 0.78016222, 0.78035363]),
      'std_test_score': array([0.00855542, 0.0070319 , 0.00651073, 0.00640998]),
      'rank_test_score': array([4, 3, 2, 1]),
      'split0_train_score': array([0.86386211, 0.88101725, 0.89106634, 0.89835051]),
      'split1_train_score': array([0.86284551, 0.87877078, 0.88780479, 0.89494197]),
      'split2_train_score': array([0.85757011, 0.87496329, 0.88575537, 0.89367633]),
      'mean_train_score': array([0.86142591, 0.87825044, 0.88820883, 0.89565627]),
      'std_train_score': array([0.00275787, 0.00249875, 0.00218694, 0.00197394])},
     {'n_estimators': 1000},
     0.7803536277850995)
    In [195]:
    param_grid2 = {'n_estimators': range(1000,2000,300)}
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_search2 = GridSearchCV(estimator = GBR_test, param_grid = param_grid2, 
                              cv = 5, n_jobs = 3, verbose = 1)
    grid_search2.fit(X_train,y_train)
    
    Fitting 5 folds for each of 4 candidates, totalling 20 fits
    
    [Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
    [Parallel(n_jobs=3)]: Done  20 out of  20 | elapsed:  4.1min finished
    
    Out[195]:
    GridSearchCV(cv=5, error_score='raise-deprecating',
           estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                 learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
                 max_leaf_nodes=None, min_impurity_decrease=0.0,
                 min_impurity_split=None, min_samples_leaf=1,
                 min_sampl...te=22, subsample=1.0, tol=0.0001,
                 validation_fraction=0.1, verbose=0, warm_start=False),
           fit_params=None, iid='warn', n_jobs=3,
           param_grid={'n_estimators': range(1000, 2000, 300)},
           pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
           scoring=None, verbose=1)
    In [196]:
    grid_search2.best_params_, grid_search2.best_score_
    
    Out[196]:
    ({'n_estimators': 1000}, 0.7885965739886799)

    n_estimators of 1000 is giving best result in range 400 to 1000

    In [197]:
    param_grid3 = {
        'learning_rate': [0.1,0.2],
        'min_samples_leaf': [5,10,20],
        'min_samples_split': [5,10,20],
        'n_estimators': [500,1000],
    }
    
    In [198]:
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_search3 = GridSearchCV(estimator = GBR_test, param_grid = param_grid3, 
                              cv = 5, n_jobs = 3, verbose = 1)
    grid_search3.fit(X_train,y_train)
    
    Fitting 5 folds for each of 36 candidates, totalling 180 fits
    
    [Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
    [Parallel(n_jobs=3)]: Done  44 tasks      | elapsed:  5.1min
    [Parallel(n_jobs=3)]: Done 180 out of 180 | elapsed: 20.3min finished
    
    Out[198]:
    GridSearchCV(cv=5, error_score='raise-deprecating',
           estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                 learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
                 max_leaf_nodes=None, min_impurity_decrease=0.0,
                 min_impurity_split=None, min_samples_leaf=1,
                 min_sampl...te=22, subsample=1.0, tol=0.0001,
                 validation_fraction=0.1, verbose=0, warm_start=False),
           fit_params=None, iid='warn', n_jobs=3,
           param_grid={'learning_rate': [0.1, 0.2], 'min_samples_leaf': [5, 10, 20], 'min_samples_split': [5, 10, 20], 'n_estimators': [500, 1000]},
           pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
           scoring=None, verbose=1)
    In [199]:
    grid_search3.best_params_, grid_search3.best_score_
    
    Out[199]:
    ({'learning_rate': 0.1,
      'min_samples_leaf': 10,
      'min_samples_split': 5,
      'n_estimators': 1000},
     0.7880978276736184)

    In combination of 4 parameters above values are giving best result. We can see n_estimators of 1000 is best again. Now, will change the ranges of other 3 parameters

    In [200]:
    param_grid4 = {
        'learning_rate': [0.1,0.15],
        'max_depth': [5,10],
        'min_samples_leaf': [5,8],
        'min_samples_split': [20,30],
        'n_estimators': [1000],
    }
    
    In [201]:
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_search4 = GridSearchCV(estimator = GBR_test, param_grid = param_grid4, 
                              cv = 5, n_jobs = 3, verbose = 1)
    grid_search4.fit(X_train,y_train)
    
    Fitting 5 folds for each of 16 candidates, totalling 80 fits
    
    [Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.
    [Parallel(n_jobs=3)]: Done  44 tasks      | elapsed: 23.3min
    [Parallel(n_jobs=3)]: Done  80 out of  80 | elapsed: 45.2min finished
    
    Out[201]:
    GridSearchCV(cv=5, error_score='raise-deprecating',
           estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                 learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
                 max_leaf_nodes=None, min_impurity_decrease=0.0,
                 min_impurity_split=None, min_samples_leaf=1,
                 min_sampl...te=22, subsample=1.0, tol=0.0001,
                 validation_fraction=0.1, verbose=0, warm_start=False),
           fit_params=None, iid='warn', n_jobs=3,
           param_grid={'learning_rate': [0.1, 0.15], 'max_depth': [5, 10], 'min_samples_leaf': [5, 8], 'min_samples_split': [20, 30], 'n_estimators': [1000]},
           pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
           scoring=None, verbose=1)
    In [202]:
    grid_search4.best_params_, grid_search4.best_score_
    
    Out[202]:
    ({'learning_rate': 0.1,
      'max_depth': 5,
      'min_samples_leaf': 8,
      'min_samples_split': 20,
      'n_estimators': 1000},
     0.7821899364744039)

    Now the score has reduced compared to earlier run

    In [203]:
    param_grid5 = {
        'learning_rate': [0.1],
        'max_depth': [5],
        'min_samples_leaf': [8,10],
        'min_samples_split': [30,40],
        'n_estimators': [1000],
    }
    
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_search5 = GridSearchCV(estimator = GBR_test, param_grid = param_grid5, 
                              cv = 5, n_jobs = 2, verbose = 1)
    grid_search5.fit(X_train,y_train)
    
    Fitting 5 folds for each of 4 candidates, totalling 20 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  20 out of  20 | elapsed:  7.6min finished
    
    Out[203]:
    GridSearchCV(cv=5, error_score='raise-deprecating',
           estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                 learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
                 max_leaf_nodes=None, min_impurity_decrease=0.0,
                 min_impurity_split=None, min_samples_leaf=1,
                 min_sampl...te=22, subsample=1.0, tol=0.0001,
                 validation_fraction=0.1, verbose=0, warm_start=False),
           fit_params=None, iid='warn', n_jobs=2,
           param_grid={'learning_rate': [0.1], 'max_depth': [5], 'min_samples_leaf': [8, 10], 'min_samples_split': [30, 40], 'n_estimators': [1000]},
           pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
           scoring=None, verbose=1)
    In [204]:
    grid_search5.best_params_, grid_search5.best_score_
    
    Out[204]:
    ({'learning_rate': 0.1,
      'max_depth': 5,
      'min_samples_leaf': 10,
      'min_samples_split': 40,
      'n_estimators': 1000},
     0.7844535606632613)

    Above score has improved from earlier runs

    In [205]:
    param_grid6 = {
        'learning_rate': [0.1],
        'max_depth': [5],
        'min_samples_leaf': [8],
        'min_samples_split': [40,50],
        'n_estimators': [1000],
    }
    
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_search6 = GridSearchCV(estimator = GBR_test, param_grid = param_grid6, 
                              cv = 5, n_jobs = 2, verbose = 1)
    grid_search6.fit(X_train,y_train)
    
    Fitting 5 folds for each of 2 candidates, totalling 10 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  10 out of  10 | elapsed:  3.6min finished
    
    Out[205]:
    GridSearchCV(cv=5, error_score='raise-deprecating',
           estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                 learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
                 max_leaf_nodes=None, min_impurity_decrease=0.0,
                 min_impurity_split=None, min_samples_leaf=1,
                 min_sampl...te=22, subsample=1.0, tol=0.0001,
                 validation_fraction=0.1, verbose=0, warm_start=False),
           fit_params=None, iid='warn', n_jobs=2,
           param_grid={'learning_rate': [0.1], 'max_depth': [5], 'min_samples_leaf': [8], 'min_samples_split': [40, 50], 'n_estimators': [1000]},
           pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
           scoring=None, verbose=1)
    In [206]:
    grid_search6.best_params_, grid_search6.best_score_
    
    Out[206]:
    ({'learning_rate': 0.1,
      'max_depth': 5,
      'min_samples_leaf': 8,
      'min_samples_split': 50,
      'n_estimators': 1000},
     0.7828068526559553)

    There is very marginal improvment in score. We are getting best score at min_samples_split of 40 among 30,40,50.

    Will tune the final set of parameters along with above finalized ones
    In [207]:
    param_grid7 = {
        'loss':['ls','lad','huber'],
        'max_features': ['auto','sqrt'],
        'learning_rate': [0.1],
        'max_depth': [5],
        'min_samples_leaf': [8],
        'min_samples_split': [40],
        'n_estimators': [1000],
        'subsample':[0.8,1]
    }
    
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_search7 = GridSearchCV(estimator = GBR_test, param_grid = param_grid7, 
                              cv = 5, n_jobs = 2, verbose = 1)
    grid_search7.fit(X_train,y_train)
    
    Fitting 5 folds for each of 12 candidates, totalling 60 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  46 tasks      | elapsed: 12.2min
    [Parallel(n_jobs=2)]: Done  60 out of  60 | elapsed: 14.8min finished
    
    Out[207]:
    GridSearchCV(cv=5, error_score='raise-deprecating',
           estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                 learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
                 max_leaf_nodes=None, min_impurity_decrease=0.0,
                 min_impurity_split=None, min_samples_leaf=1,
                 min_sampl...te=22, subsample=1.0, tol=0.0001,
                 validation_fraction=0.1, verbose=0, warm_start=False),
           fit_params=None, iid='warn', n_jobs=2,
           param_grid={'loss': ['ls', 'lad', 'huber'], 'max_features': ['auto', 'sqrt'], 'learning_rate': [0.1], 'max_depth': [5], 'min_samples_leaf': [8], 'min_samples_split': [40], 'n_estimators': [1000], 'subsample': [0.8, 1]},
           pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
           scoring=None, verbose=1)
    In [208]:
    grid_search7.best_params_, grid_search7.best_score_
    
    Out[208]:
    ({'learning_rate': 0.1,
      'loss': 'huber',
      'max_depth': 5,
      'max_features': 'sqrt',
      'min_samples_leaf': 8,
      'min_samples_split': 40,
      'n_estimators': 1000,
      'subsample': 1},
     0.7965973506104334)

    There is improvement in the score. will try one more iteration with changing other parameters

    In [209]:
    param_gridF = {
        'loss':['huber'],
        'max_features': ['sqrt'],
        'learning_rate': [0.1,0.2],
        'max_depth': [5,8],
        'min_samples_leaf': [5],
        'min_samples_split': [40,50],
        'n_estimators': [1000],
        'subsample':[1]
    }
    
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_searchF = GridSearchCV(estimator = GBR_test, param_grid = param_gridF, 
                              cv = 5, n_jobs = 2, verbose = 1)
    grid_searchF.fit(X_train,y_train)
    grid_searchF.best_params_,grid_searchF.best_score_
    
    Fitting 5 folds for each of 8 candidates, totalling 40 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  40 out of  40 | elapsed:  6.0min finished
    
    Out[209]:
    ({'learning_rate': 0.1,
      'loss': 'huber',
      'max_depth': 5,
      'max_features': 'sqrt',
      'min_samples_leaf': 5,
      'min_samples_split': 40,
      'n_estimators': 1000,
      'subsample': 1},
     0.7958994895003749)
  • The above iteration gives best result of 0.799.
  • Final parameters that are giving best result on training set are:
  • 'learning_rate': 0.1, 'loss': 'huber', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_leaf': 5, 'min_samples_split': 50, 'n_estimators': 1000, 'subsample': 1 </b>

    Hypertuning using graph

    In [210]:
    min_samples_leafs = range(1, 15, 1)
    train_results = []
    val_results = []
    for min_samples_leaf in min_samples_leafs:
       GBR_test=GradientBoostingRegressor(
            loss='huber',
            learning_rate=0.1,
            n_estimators=1000,
            subsample=1.0,
            min_samples_split=40,
            min_samples_leaf=min_samples_leaf,
            max_depth=5,
            random_state=22,
            alpha=0.9,
    )
       GBR_test.fit(X_train,y_train)
       y_GBR_predtr= GBR_test.predict(X_train)
       y_GBR_predvl= GBR_test.predict(X_val)
       
       result_leafs_tr=r2_score(y_GBR_predtr,y_train)
       train_results.append(result_leafs_tr)
       result_leafs_vl=r2_score(y_GBR_predvl,y_val)
       val_results.append(result_leafs_vl)
       
    from matplotlib.legend_handler import HandlerLine2D
    line1, = plt.plot(min_samples_leafs,train_results,"b", label='Train r2')
    line2, = plt.plot(min_samples_leafs, val_results,"r", label='val r2')
    plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
    plt.ylabel("r2 score")
    plt.xlabel("min samples leaf")
    plt.show()
    

    From above, min_samples_leaf of 6 is giving best score

    In [211]:
    min_samples_splits = [10,15,30,50,100,500,700,1000]
    train_results_spt = []
    val_results_spt = []
    for min_samples_split in min_samples_splits:
       GBR_test=GradientBoostingRegressor(
            loss='huber',
            learning_rate=0.1,
            n_estimators=1000,
            subsample=1.0,
            min_samples_split=min_samples_split,
            min_samples_leaf=5,
            max_depth=5,
            random_state=22,
            alpha=0.9,
            )
       GBR_test.fit(X_train,y_train)
       y_GBR_predtr= GBR_test.predict(X_train)
       y_GBR_predvl= GBR_test.predict(X_val)
       
       result_spt_tr=r2_score(y_GBR_predtr,y_train)
       train_results_spt.append(result_spt_tr)
       result_spt_vl=r2_score(y_GBR_predvl,y_val)
       val_results_spt.append(result_spt_vl)
       
    from matplotlib.legend_handler import HandlerLine2D
    line1, = plt.plot(min_samples_splits,train_results_spt,"b", label='Train R2')
    line2, = plt.plot(min_samples_splits, val_results_spt,"r", label='Val R2')
    plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
    plt.ylabel("R2 score")
    plt.xlabel("min samples split")
    plt.show()
    

    From above, min_samples_splits of about 10 is giving best score. Will try expanding the range around 10

    In [212]:
    min_samples_splits = [10,15,20,30,40,50,60,70,80,90,100]
    train_results_spt = []
    val_results_spt = []
    for min_samples_split in min_samples_splits:
       GBR_test=GradientBoostingRegressor(
            loss='huber',
            learning_rate=0.1,
            n_estimators=1000,
            subsample=1.0,
            min_samples_split=min_samples_split,
            min_samples_leaf=5,
            max_depth=5,
            random_state=22,
            alpha=0.9,
            )
       GBR_test.fit(X_train,y_train)
       y_GBR_predtr= GBR_test.predict(X_train)
       y_GBR_predvl= GBR_test.predict(X_val)
       
       result_spt_tr=r2_score(y_GBR_predtr,y_train)
       train_results_spt.append(result_spt_tr)
       result_spt_vl=r2_score(y_GBR_predvl,y_val)
       val_results_spt.append(result_spt_vl)
       
    from matplotlib.legend_handler import HandlerLine2D
    line1, = plt.plot(min_samples_splits,train_results_spt,"b", label='Train R2')
    line2, = plt.plot(min_samples_splits, val_results_spt,"r", label='Val R2')
    plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
    plt.ylabel("R2 score")
    plt.xlabel("min samples split")
    plt.show()
    

    From above, min_samples_splits of about 10 is giving best score

    In [213]:
    min_samples_splits = [7,8,9,10,11,12,13,14,15,20]
    train_results_spt = []
    val_results_spt = []
    for min_samples_split in min_samples_splits:
       GBR_test=GradientBoostingRegressor(
            loss='huber',
            learning_rate=0.1,
            n_estimators=1000,
            subsample=1.0,
            min_samples_split=min_samples_split,
            min_samples_leaf=5,
            max_depth=5,
            random_state=22,
            alpha=0.9,
            )
       GBR_test.fit(X_train,y_train)
       y_GBR_predtr= GBR_test.predict(X_train)
       y_GBR_predvl= GBR_test.predict(X_val)
       
       result_spt_tr=r2_score(y_GBR_predtr,y_train)
       train_results_spt.append(result_spt_tr)
       result_spt_vl=r2_score(y_GBR_predvl,y_val)
       val_results_spt.append(result_spt_vl)
       
    from matplotlib.legend_handler import HandlerLine2D
    line1, = plt.plot(min_samples_splits,train_results_spt,"b", label='Train R2')
    line2, = plt.plot(min_samples_splits, val_results_spt,"r", label='Val R2')
    plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
    plt.ylabel("R2 score")
    plt.xlabel("min samples split")
    plt.show()
    

    From above, min_samples_splits of about 12 is giving best score

    In [214]:
    max_depths = range(3,11,1)
    train_results_dpt = []
    val_results_dpt = []
    for max_depth in max_depths:
       GBR_test=GradientBoostingRegressor(
            loss='huber',
            learning_rate=0.1,
            n_estimators=1000,
            subsample=1.0,
            min_samples_split=10,
            min_samples_leaf=6,
            max_depth=max_depth,
            random_state=22,
            alpha=0.9,
            )
       GBR_test.fit(X_train,y_train)
       y_GBR_predtr= GBR_test.predict(X_train)
       y_GBR_predvl= GBR_test.predict(X_val)
       
       result_dpt_tr=r2_score(y_GBR_predtr,y_train)
       train_results_dpt.append(result_dpt_tr)
       result_dpt_vl=r2_score(y_GBR_predvl,y_val)
       val_results_dpt.append(result_dpt_vl)
       
    from matplotlib.legend_handler import HandlerLine2D
    line1, = plt.plot(max_depths,train_results_dpt,"b", label='Train R2')
    line2, = plt.plot(max_depths, val_results_dpt,"r", label='Val R2')
    plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
    plt.ylabel("R2 score")
    plt.xlabel("max depth")
    plt.show()
    

    From above, max_depth of about 6 is giving best score for validation set and not overfitting of training set

    In [215]:
    estimators = range(100,1500,100)
    train_results_est = []
    val_results_est = []
    for n_estimators in estimators:
       GBR_test=GradientBoostingRegressor(
            loss='huber',
            learning_rate=0.1,
            n_estimators=n_estimators,
            subsample=1.0,
            min_samples_split=30,
            min_samples_leaf=6,
            max_depth=9,
            random_state=22,
            alpha=0.9,
            )
       GBR_test.fit(X_train,y_train)
       y_GBR_predtr= GBR_test.predict(X_train)
       y_GBR_predvl= GBR_test.predict(X_val)
       
       result_est_tr=r2_score(y_GBR_predtr,y_train)
       train_results_est.append(result_est_tr)
       result_est_vl=r2_score(y_GBR_predvl,y_val)
       val_results_est.append(result_est_vl)
       
    from matplotlib.legend_handler import HandlerLine2D
    line1, = plt.plot(estimators,train_results_est,"b", label='Train R2')
    line2, = plt.plot(estimators, val_results_est,"r", label='Val R2')
    plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
    plt.ylabel("R2 score")
    plt.xlabel("n_estimators")
    plt.show()
    

    From above, n_estimators of about 1000 is giving best score

    In [217]:
    param_gridF = {
        'loss':['huber'],
        'max_features': ['sqrt'],
        'learning_rate': [0.1],
        'max_depth': [6],
        'min_samples_leaf': [6],
        'min_samples_split': [12],
        'n_estimators': [1000],
        'subsample':[1]
    }
    
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_searchF = GridSearchCV(estimator = GBR_test, param_grid = param_gridF, 
                              cv = 5, n_jobs = 2, verbose = 1)
    grid_searchF.fit(X_train,y_train)
    grid_searchF.best_score_
    
    Fitting 5 folds for each of 1 candidates, totalling 5 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:   58.4s finished
    
    Out[217]:
    0.7934419703161365
    In [218]:
    param_gridF = {
        'loss':['huber'],
        'max_features': ['sqrt'],
        'learning_rate': [0.1],
        'max_depth': [5],
        'min_samples_leaf': [5],
        'min_samples_split': [50],
        'n_estimators': [1000],
        'subsample':[1]
    }
    
    GBR_test=GradientBoostingRegressor(random_state=22)
    
    grid_searchF = GridSearchCV(estimator = GBR_test, param_grid = param_gridF, 
                              cv = 5, n_jobs = 2, verbose = 1)
    grid_searchF.fit(X_train,y_train)
    grid_searchF.best_score_,grid_searchF.best_params_
    
    Fitting 5 folds for each of 1 candidates, totalling 5 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done   5 out of   5 | elapsed:   35.2s finished
    
    Out[218]:
    (0.7928868850462906,
     {'learning_rate': 0.1,
      'loss': 'huber',
      'max_depth': 5,
      'max_features': 'sqrt',
      'min_samples_leaf': 5,
      'min_samples_split': 50,
      'n_estimators': 1000,
      'subsample': 1})

    We can conclude from above that gridsearch CV is giving better results compared to that of tuning done by graphical method of individual parameters

  • Final parameters that are giving best result on training set are:
  • 'learning_rate': 0.1, 'loss': 'huber', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_leaf': 5, 'min_samples_split': 50, 'n_estimators': 1000, 'subsample': 1 </b>

    CONFIDENCE INTERVAL

    In [219]:
    GBR_bestparam=GradientBoostingRegressor(
            loss='huber',
            learning_rate=0.1,
            n_estimators=1000,
            subsample=1.0,
            min_samples_split=50,
            min_samples_leaf=5,
            max_depth=5,
            random_state=22,
            alpha=0.9,
            )
    GBR_bestparam.fit(X_train,y_train)
    y_GBRF_predtr= GBR_bestparam.predict(X_train)
    y_GBRF_predvl= GBR_bestparam.predict(X_val)
    y_GBRF_predts= GBR_bestparam.predict(X_test)
    
    In [220]:
    #Model score and Deduction for each Model in a DataFrame
    GBRF_trscore=r2_score(y_train,y_GBRF_predtr)
    GBRF_trRMSE=np.sqrt(mean_squared_error(y_train, y_GBRF_predtr))
    GBRF_trMSE=mean_squared_error(y_train, y_GBRF_predtr)
    GBRF_trMAE=mean_absolute_error(y_train, y_GBRF_predtr)
    
    GBRF_vlscore=r2_score(y_val,y_GBRF_predvl)
    GBRF_vlRMSE=np.sqrt(mean_squared_error(y_val, y_GBRF_predvl))
    GBRF_vlMSE=mean_squared_error(y_val, y_GBRF_predvl)
    GBRF_vlMAE=mean_absolute_error(y_val, y_GBRF_predvl)
    
    GBRF_tsscore=r2_score(y_test,y_GBRF_predts)
    GBRF_tsRMSE=np.sqrt(mean_squared_error(y_test, y_GBRF_predts))
    GBRF_tsMSE=mean_squared_error(y_test, y_GBRF_predts)
    GBRF_tsMAE=mean_absolute_error(y_test, y_GBRF_predts)
    
    GBRF_df=pd.DataFrame({'Method':['GBRF'],'Val Score':GBRF_vlscore,'RMSE_vl': GBRF_vlRMSE, 'MSE_vl': GBRF_vlMSE,'train Score':GBRF_trscore,'RMSE_tr': GBRF_trRMSE, 'MSE_tr': GBRF_trMSE,'test Score':GBRF_tsscore,'RMSE_ts': GBRF_tsRMSE, 'MSE_ts': GBRF_tsMSE})
    
    
    GBRF_df
    
    Out[220]:
    Method Val Score RMSE_vl MSE_vl train Score RMSE_tr MSE_tr test Score RMSE_ts MSE_ts
    0 GBRF 0.80096 115867.988855 1.342539e+10 0.898909 81372.879729 6.621546e+09 0.793584 114695.310542 1.315501e+10
    In [221]:
    from sklearn.model_selection import KFold
    from sklearn.model_selection import cross_val_score
    
    num_folds = 50
    seed = 7
    
    kfold = KFold(n_splits=num_folds, random_state=seed)
    model = GradientBoostingRegressor(n_estimators = 200, learning_rate = 0.1, random_state=22)
    results = cross_val_score(GBR_bestparam, X, y, cv=kfold)
    print(results)
    print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))
    
    [0.86054651 0.81529    0.80351765 0.86060958 0.79642892 0.85548539
     0.78098527 0.77925365 0.81822936 0.82102096 0.87499995 0.81800409
     0.81853737 0.82096864 0.82206478 0.85415595 0.7952127  0.77879311
     0.85529758 0.83972439 0.76258618 0.80910137 0.80208101 0.82664724
     0.7825543  0.8601369  0.77441922 0.78867005 0.84107987 0.79025948
     0.84773597 0.76865873 0.78487112 0.80018574 0.82324413 0.82243794
     0.74048912 0.82370621 0.82606705 0.83661657 0.79192532 0.8126131
     0.79097264 0.81741328 0.76640402 0.77512715 0.78013298 0.7859921
     0.73054971 0.76721522]
    Accuracy: 80.798% (3.241%)
    
    In [222]:
    from matplotlib import pyplot
    # plot scores
    pyplot.hist(results)
    pyplot.show()
    # confidence intervals
    alpha = 0.95                             # for 95% confidence 
    p = ((1.0-alpha)/2.0) * 100              # tail regions on right and left .25 on each side indicated by P value (border)
    lower = max(0.0, np.percentile(results, p))  
    p = (alpha+((1.0-alpha)/2.0)) * 100
    upper = min(1.0, np.percentile(results, p))
    print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha*100, lower*100, upper*100))
    
    95.0 confidence interval 74.5% and 86.1%
    

    Dataset-1 Final summary:

  • The ensemble models have performed well compared to that of linear,KNN,SVR models
  • The best performance is given by Gradient boosting model with training (score-0.89,RMSE-81372), Validation (score-0.80,RSME-115867), Testing(score-0.79,RMSE-114695) The 95% confidence interval scores range from 0.72 to 0.85.
  • The top key features that drive the price of the property are: 'furnished_1', 'yr_built', 'living_measure','quality_8', 'HouseLandRatio', 'lot_measure15', 'quality_9', 'ceil_measure', 'total_area'.
  • The above data is also reinforced by the analysis done during bivariate analysis.
  • For further improvization, the datasets can be made by treating outliers in different ways and hypertuning the ensemble models.
  • </b>

    Dataset-2

    In [2]:
    import geopandas as gpd
    from shapely.geometry import Point, Polygon
    #For current working directory
    import os
    cwd = os.getcwd()
    
    In [224]:
    ## Need to add file USA ZipCodes_1.xlsx to current working directory to access this data
    USAZip=pd.read_excel("USA ZipCodes_1.xlsx",sheet_name="Sheet8")
    USAZip.head()
    
    Out[224]:
    zipcode City County Type
    0 98001 Auburn King Standard
    1 98002 Auburn King Standard
    2 98003 Federal Way King Standard
    3 98004 Bellevue King Standard
    4 98005 Bellevue King Standard
    In [239]:
    house_df = pd.read_csv('innercity.csv')
    
    In [240]:
    house_df1=house_df.merge(USAZip,how='left',on='zipcode')
    #house_df.drop_duplicates()
    
    house_df.shape
    
    Out[240]:
    (21613, 23)
    In [5]:
    #Add the folder WA to your current working directory
    usa = gpd.read_file(cwd+'\\WA\\WSDOT__City_Limits.shp')
    usa.head()
    gdf = gpd.GeoDataFrame(
        house_df,geometry = [Point(xy) for xy in zip(house_df['long'], house_df['lat'])])
    #We can now plot our ``GeoDataFrame``
    ax=usa[usa.CityName.isin(house_df.City.unique())].plot(
        color='white', edgecolor='black',figsize=(20,8))
    plt.figure(figsize=(15,15))
    gdf.plot(ax=ax, color='green', marker='o',markersize=0.1)
    
    Out[5]:
    <matplotlib.axes._subplots.AxesSubplot at 0x1ccf1142588>
    <Figure size 1080x1080 with 0 Axes>
    In [241]:
    #After analysis in p1 - Dropping 'cid','dayhours','basement','yr_built','yr_renovated','zipcode','lat','long','County','Type',
    #'geometry','quality_group','month_year' columns.
    cols=['cid','dayhours']
    house_df_1=house_df.drop(cols, inplace = False, axis = 1)
    

    The dataset worked earlier are giving r2 score on validation set in range 70%-75% with RMSE in range of 96000 to 155000. Trying with a different dataset to see if this could be improved further.

    For analysis in this iteration categorizing coast, furnished and quality. As in previous version tranformed many features but not got desired result.

    TREATING OUTLIERS

    Removing data points which fall into below criteria:

    1. living_measure greater than 9000
    2. price greater than 4000000
    3. romm_bed greater than 10
    4. room_bath greater than 6

    We have lost 20 records which is 0.09% of the data available. These records are extreme values for which we dont have much of data to provide their better estimate. Hence removing them.

    In [242]:
    house_df_2=house_df_1[(house_df['living_measure']<=9000) & (house_df_1['price']<=4000000) & 
                          (house_df_1['room_bed']<=10) & (house_df_1['room_bath']<=6) ]
    house_df_2.shape
    
    Out[242]:
    (21593, 21)
    In [243]:
    house_df_2.columns
    
    Out[243]:
    Index(['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure',
           'ceil', 'coast', 'sight', 'condition', 'quality', 'ceil_measure',
           'basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long',
           'living_measure15', 'lot_measure15', 'furnished', 'total_area'],
          dtype='object')
    In [252]:
    # Convert into dummies
    house_df_final = pd.get_dummies(house_df_2, columns=['coast', 'quality', 'furnished'],drop_first=True)
    
    In [253]:
    house_df_final.columns
    
    Out[253]:
    Index(['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure',
           'ceil', 'sight', 'condition', 'ceil_measure', 'basement', 'yr_built',
           'yr_renovated', 'zipcode', 'lat', 'long', 'living_measure15',
           'lot_measure15', 'total_area', 'coast_1', 'quality_3', 'quality_4',
           'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9',
           'quality_10', 'quality_11', 'quality_12', 'quality_13', 'furnished_1'],
          dtype='object')
    In [254]:
    house_df_final.shape
    
    Out[254]:
    (21593, 31)
    In [268]:
    #Final Data columns
    house_df_final.columns
    
    Out[268]:
    Index(['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure',
           'ceil', 'sight', 'condition', 'ceil_measure', 'basement', 'yr_built',
           'yr_renovated', 'zipcode', 'lat', 'long', 'living_measure15',
           'lot_measure15', 'total_area', 'coast_1', 'quality_3', 'quality_4',
           'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9',
           'quality_10', 'quality_11', 'quality_12', 'quality_13', 'furnished_1'],
          dtype='object')

    Shows the Data Correlation between Attributes with Heatmap

    In [256]:
    #total_area is highly correlated with lot_measure, ceil_measure is highly correlated with living_measure
    house_corr_2 = house_df_final.corr(method ='pearson')
    house_corr_2.to_excel("house_corr_2.xls")
    
    plt.figure(figsize=(35,20))
    sns.heatmap(house_corr_2,cmap="coolwarm", annot=True,annot_kws={"size":9},fmt='.2')
    
    Out[256]:
    <matplotlib.axes._subplots.AxesSubplot at 0x225943454a8>
    In [257]:
    #creating a copy of the final dataframe
    dff2=house_df_final.copy()
    
    In [258]:
    df_train, df_test = train_test_split(dff2, test_size=0.2, random_state=10)
    df_train, df_val = train_test_split(df_train, test_size=0.2, random_state=10)
    
    In [259]:
    print(df_train.shape)
    print(df_test.shape)
    print(df_val.shape)
    
    (13819, 31)
    (4319, 31)
    (3455, 31)
    
    In [260]:
    # Split the 'df_train' set into X and y
    X_train2 = df_train.drop(['price'],axis=1)
    y_train2 = df_train['price']
    len_train=len(X_train2)
    X_train2.shape
    y_train2.head()
    
    Out[260]:
    1320     330000
    16628    245000
    2923     369000
    15818    532000
    4665     506400
    Name: price, dtype: int64
    In [261]:
    # Split the 'df_val' set into X and y
    X_val2 = df_val.drop(['price'],axis=1)
    y_val2 = df_val['price']
    len_val=len(X_val2)
    X_val2.shape
    y_val2.head()
    
    Out[261]:
    6030     225000
    16781    373500
    17420    325000
    4147     260000
    17992    233000
    Name: price, dtype: int64
    In [262]:
    # Split the 'df_test' set into X and y
    X_test2 = df_test.drop(['price'],axis=1)
    y_test2 = df_test['price']
    X_test2.shape
    len_test=len(X_test2)
    y_test2.head()
    
    Out[262]:
    19155    510000
    10450    264500
    14277    266000
    7601     735000
    6563     600000
    Name: price, dtype: int64
    Will use XGboost model apart from models that used earlier on dataset-1

    Creating Dataframe for Results and Function to compute the scores for each model on its Train and Validation datasets

    In [24]:
    #Creating empty dataframe to capture results
    result_dff=pd.DataFrame()
    
    In [25]:
    #Function to give results of the models for its train and validation dataset.
    #as input it requries model name to display, algorithm, train indepedent variables, train dependent variable, 
    #validation indepedent variables, validation dependent variable.
    def result (model,pipe_model,X_train_set,y_train_set,X_val_set,y_val_set):
        pipe_model.fit(X_train_set,y_train_set)
        #predicting result over test data
        y_train_predict= pipe_model.predict(X_train_set)
        y_val_predict= pipe_model.predict(X_val_set)
    
        trscore=r2_score(y_train_set,y_train_predict)
        trRMSE=np.sqrt(mean_squared_error(y_train_set,y_train_predict))
        trMSE=mean_squared_error(y_train_set,y_train_predict)
        trMAE=mean_absolute_error(y_train_set,y_train_predict)
    
        vlscore=r2_score(y_val,y_val_predict)
        vlRMSE=np.sqrt(mean_squared_error(y_val,y_val_predict))
        vlMSE=mean_squared_error(y_val,y_val_predict)
        vlMAE=mean_absolute_error(y_val,y_val_predict)
        result_df=pd.DataFrame({'Method':[model],'val score':vlscore,'RMSE_val':vlRMSE,'MSE_val':vlMSE,'MAE_vl': vlMAE,
                              'train Score':trscore,'RMSE_tr': trRMSE,'MSE_tr': trMSE, 'MAE_tr': trMAE})
        #Plot between actual and predicted values
        plt.figure(figsize=(18,10))
        sns.lineplot(range(len(y_val_set)),y_val_set,color='blue',linewidth=1.5)
        sns.lineplot(range(len(y_val_set)),y_val_predict,color='hotpink',linewidth=.5)
        plt.title('Actual and Predicted', fontsize=20)       # Plot heading 
        plt.xlabel('Index', fontsize=10)                           # X-label
        plt.ylabel('Values', fontsize=10)                          # Y-label
    
        return result_df
    

    LINEAR REGRESSION

    In [26]:
    #Starting with RFE first as there are many features
    from sklearn.linear_model import LinearRegression
    from sklearn.pipeline import Pipeline
    from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
    
    In [27]:
    clf=LinearRegression()
    pipe_lr = Pipeline([('LR', clf)])
    result_dff=pd.concat([result_dff,result('Linear Reg',pipe_lr,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[27]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    In [28]:
    #checking the magnitude of coefficients
    predictors = X_train.columns
    coef = pd.Series(clf.coef_,predictors).sort_values()
    coef.plot(kind='bar', title='Model Coefficients',color='darkblue',figsize=(10,5))
    
    Out[28]:
    <matplotlib.axes._subplots.AxesSubplot at 0x1ccf527d438>

    RIDGE REGRESSION

    In [29]:
    from sklearn.linear_model import Ridge
    from sklearn.pipeline import Pipeline
    from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
    from sklearn.preprocessing import StandardScaler
    
    In [30]:
    clf=Ridge()
    pipe_ridge = Pipeline([('Ridge', clf)])
    result_dff=pd.concat([result_dff,result('Ridge_Reg_1',pipe_ridge,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[30]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    In [31]:
    #checking the magnitude of coefficients
    predictors = X_train.columns
    coef = pd.Series(clf.coef_,predictors).sort_values()
    coef.plot(kind='bar', title='Model Coefficients',color='darkblue',figsize=(10,5))
    
    Out[31]:
    <matplotlib.axes._subplots.AxesSubplot at 0x1ccf5c6cd30>
    In [32]:
    #Iteration 2
    clf=Ridge(alpha=0.08)
    pipe_ridge_1 = Pipeline([('Ridge',clf )])
    result_dff=pd.concat([result_dff,result('Ridge_Reg_2',pipe_ridge_1,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[32]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    In [33]:
    #checking the magnitude of coefficients
    predictors = X_train.columns
    coef = pd.Series(clf.coef_,predictors).sort_values()
    coef.plot(kind='bar', title='Model Coefficients',color='darkblue',figsize=(10,5))
    
    Out[33]:
    <matplotlib.axes._subplots.AxesSubplot at 0x1ccf5d78358>

    LASSO REGRESSION

    In [34]:
    from sklearn.linear_model import Lasso
    
    In [35]:
    clf=Lasso(alpha=10, max_iter=1000)
    pipe_lasso_1 = Pipeline([('Lasso',clf )])
    result_dff=pd.concat([result_dff,result('Lasso_Reg_1',pipe_lasso_1,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[35]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    In [36]:
    #checking the magnitude of coefficients
    predictors = X_train.columns
    coef = pd.Series(clf.coef_,predictors).sort_values(ascending=False)
    coef
    
    Out[36]:
    quality_13         1282757.93547
    quality_12          720634.25526
    lat                 603385.77684
    coast_1             515951.63187
    furnished_1         356060.28711
    quality_11          254292.72718
    quality_8            51062.02158
    sight                48526.08682
    quality_3            47977.01520
    room_bath            44364.15660
    condition            35706.89861
    ceil                 28507.08484
    living_measure         126.78842
    living_measure15        33.97555
    yr_renovated            23.50688
    total_area               0.35066
    quality_10              -0.00000
    lot_measure             -0.19029
    lot_measure15           -0.29272
    basement                -8.54716
    ceil_measure           -15.67109
    zipcode               -512.28283
    yr_built             -2269.56051
    quality_7           -16734.79835
    room_bed            -18988.08235
    quality_6           -63515.14160
    quality_4           -89548.16152
    quality_5           -97142.30729
    long               -172566.03480
    quality_9          -177720.13306
    dtype: float64

    KNN Regressor

    In [37]:
    from sklearn.neighbors import KNeighborsRegressor
    
    pipe_knr = Pipeline([('KNNR', KNeighborsRegressor(n_neighbors=20,weights='distance'))])
    result_dff=pd.concat([result_dff,result('KNN Reg',pipe_knr,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[37]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

    Support Vector Regressor

    In [38]:
    #The model is not performing well at all.
    #from sklearn.svm import SVR
    #from sklearn.preprocessing import StandardScaler
    
    #pipe_svr_1 = Pipeline([('scl', StandardScaler()),('SVR_1', SVR(kernel='rbf'))])
    #result_dff=pd.concat([result_dff,result('SVR_1',pipe_svr_1,X_train_rfe,y_train,X_val_rfe,y_val)])
    #result_dff
    

    DECISION TREE

    In [39]:
    #Feature importance function
    def feat_imp(model,X_data_set):
        imp_feature_1=pd.DataFrame(model.feature_importances_, columns = ["Imp"], index = X_data_set.columns)
        imp_feature_1=imp_feature_1.sort_values(by="Imp",ascending=False)
        print(imp_feature_1)
        
        #feature importance
        plt.figure(figsize=(10,10))
        imp_feature_1[:30].plot.bar(figsize=(15,5))
    
        #First 20 and 30 feature importance sum
        print("\nFirst 8 feature importance:\t",(imp_feature_1[:8].sum())*100)
        print("\nFirst 12 feature importance:\t",(imp_feature_1[:12].sum())*100)
    
    In [40]:
    #Import library
    from sklearn.tree import DecisionTreeRegressor
    
    clf=DecisionTreeRegressor(random_state=1)
    pipe_DT_1=Pipeline([('DT1',clf)])
    result_dff=pd.concat([result_dff,result('DT1',pipe_DT_1,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[40]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    In [41]:
    #Feature importance
    feat_imp(clf,X_train)
    
                         Imp
    furnished_1      0.33440
    living_measure   0.19412
    lat              0.17853
    long             0.06748
    coast_1          0.03510
    ceil_measure     0.03389
    yr_built         0.03233
    living_measure15 0.03192
    lot_measure      0.01480
    zipcode          0.01341
    lot_measure15    0.01192
    total_area       0.00832
    quality_9        0.00781
    room_bath        0.00697
    sight            0.00633
    quality_8        0.00496
    basement         0.00436
    condition        0.00266
    quality_12       0.00247
    quality_10       0.00206
    room_bed         0.00199
    ceil             0.00180
    yr_renovated     0.00080
    quality_13       0.00048
    quality_11       0.00044
    quality_7        0.00030
    quality_6        0.00026
    quality_5        0.00008
    quality_4        0.00000
    quality_3        0.00000
    
    First 8 feature importance:	 Imp   90.77687
    dtype: float64
    
    First 12 feature importance:	 Imp   95.62215
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>

    RANDOM FOREST REGRESSOR

    In [42]:
    from sklearn.ensemble import RandomForestRegressor
    
    In [43]:
    clf=RandomForestRegressor(random_state=2)
    pipe_RF_1=Pipeline([('RF1',clf)])
    result_dff=pd.concat([result_dff,result('RF1',pipe_RF_1,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[43]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    In [44]:
    #Feature importance
    feat_imp(clf,X_train)
    
                         Imp
    furnished_1      0.30826
    living_measure   0.23477
    lat              0.17234
    long             0.06825
    living_measure15 0.03089
    yr_built         0.02564
    coast_1          0.02493
    sight            0.01985
    ceil_measure     0.01696
    zipcode          0.01531
    lot_measure15    0.01387
    quality_9        0.01243
    total_area       0.01047
    lot_measure      0.00850
    room_bath        0.00705
    basement         0.00688
    quality_8        0.00417
    room_bed         0.00380
    condition        0.00321
    quality_12       0.00262
    yr_renovated     0.00247
    ceil             0.00221
    quality_11       0.00169
    quality_10       0.00148
    quality_13       0.00096
    quality_7        0.00063
    quality_6        0.00030
    quality_5        0.00005
    quality_4        0.00001
    quality_3        0.00000
    
    First 8 feature importance:	 Imp   88.49277
    dtype: float64
    
    First 12 feature importance:	 Imp   94.34987
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>
    In [45]:
    clf=RandomForestRegressor(n_estimators=50,max_depth=18,min_samples_leaf=10,random_state=3)
    pipe_RF_2=Pipeline([('RF2',clf)])
    result_dff=pd.concat([result_dff,result('RF2',pipe_RF_2,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[45]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    In [46]:
    #Feature importance
    feat_imp(clf,X_train)
    
                         Imp
    furnished_1      0.34209
    living_measure   0.25693
    lat              0.18194
    long             0.07106
    living_measure15 0.02514
    yr_built         0.02336
    sight            0.01984
    ceil_measure     0.01841
    zipcode          0.01135
    quality_9        0.00908
    coast_1          0.00864
    lot_measure15    0.00801
    total_area       0.00561
    quality_8        0.00449
    lot_measure      0.00336
    room_bath        0.00277
    basement         0.00172
    quality_12       0.00139
    condition        0.00123
    quality_11       0.00095
    room_bed         0.00073
    quality_10       0.00073
    quality_7        0.00044
    ceil             0.00036
    yr_renovated     0.00022
    quality_6        0.00017
    quality_5        0.00001
    quality_4        0.00000
    quality_3        0.00000
    quality_13       0.00000
    
    First 8 feature importance:	 Imp   93.87443
    dtype: float64
    
    First 12 feature importance:	 Imp   97.58247
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>

    Gradient Boost Regressor

    In [47]:
    from sklearn.ensemble import GradientBoostingRegressor
    
    clf=GradientBoostingRegressor(random_state=4)
    pipe_GB_1=Pipeline([('GB1',clf)])
    result_dff=pd.concat([result_dff,result('GB1',pipe_GB_1,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[47]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    In [48]:
    #Feature importance
    feat_imp(clf,X_train)
    
                         Imp
    living_measure   0.32718
    furnished_1      0.21738
    lat              0.17507
    long             0.06494
    living_measure15 0.03217
    coast_1          0.03081
    yr_built         0.03081
    sight            0.02848
    zipcode          0.01718
    quality_9        0.01411
    ceil_measure     0.01139
    quality_12       0.00933
    quality_8        0.00850
    room_bath        0.00848
    quality_11       0.00673
    quality_13       0.00363
    lot_measure15    0.00331
    condition        0.00300
    basement         0.00221
    total_area       0.00147
    yr_renovated     0.00103
    lot_measure      0.00079
    quality_7        0.00052
    ceil             0.00048
    quality_10       0.00046
    room_bed         0.00037
    quality_6        0.00017
    quality_3        0.00000
    quality_4        0.00000
    quality_5        0.00000
    
    First 8 feature importance:	 Imp   90.68436
    dtype: float64
    
    First 12 feature importance:	 Imp   95.88511
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>
    In [49]:
    clf=GradientBoostingRegressor(n_estimators=150,max_depth=5,random_state=5)
    pipe_GB_2=Pipeline([('GB2',clf)])
    result_dff=pd.concat([result_dff,result('GB2',pipe_GB_2,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[49]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    In [50]:
    #Feature importance
    feat_imp(clf,X_train)
    
                         Imp
    living_measure   0.28697
    furnished_1      0.22921
    lat              0.17826
    long             0.07063
    living_measure15 0.04054
    yr_built         0.03118
    coast_1          0.03031
    quality_9        0.02114
    sight            0.02033
    zipcode          0.01644
    ceil_measure     0.01361
    quality_8        0.00939
    quality_10       0.00815
    total_area       0.00797
    room_bath        0.00609
    lot_measure15    0.00533
    lot_measure      0.00417
    basement         0.00412
    quality_12       0.00352
    quality_11       0.00347
    condition        0.00311
    quality_13       0.00196
    yr_renovated     0.00142
    room_bed         0.00107
    ceil             0.00096
    quality_7        0.00053
    quality_6        0.00009
    quality_5        0.00004
    quality_3        0.00000
    quality_4        0.00000
    
    First 8 feature importance:	 Imp   88.82445
    dtype: float64
    
    First 12 feature importance:	 Imp   94.80237
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>

    XGBOOST REGRESSOR

    In [51]:
    from xgboost.sklearn import XGBRegressor
    
    clf=XGBRegressor(objective='reg:squarederror',random_state=6)
    pipe_XGB_1=Pipeline([('XGB1',clf)])
    result_dff=pd.concat([result_dff,result('XGB1',pipe_XGB_1,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[51]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    In [52]:
    #Feature importance
    feat_imp(clf,X_train)
    
                         Imp
    furnished_1      0.44495
    quality_9        0.15441
    living_measure   0.08844
    coast_1          0.04030
    sight            0.03631
    quality_8        0.03330
    lat              0.03246
    long             0.02696
    quality_12       0.02049
    yr_built         0.01917
    living_measure15 0.01869
    room_bath        0.01360
    zipcode          0.01226
    quality_11       0.01098
    quality_7        0.00875
    ceil_measure     0.00861
    quality_13       0.00664
    condition        0.00428
    lot_measure15    0.00364
    yr_renovated     0.00294
    basement         0.00252
    lot_measure      0.00238
    ceil             0.00213
    total_area       0.00198
    quality_6        0.00197
    room_bed         0.00186
    quality_3        0.00000
    quality_4        0.00000
    quality_5        0.00000
    quality_10       0.00000
    
    First 8 feature importance:	 Imp   85.71182
    dtype: float32
    
    First 12 feature importance:	 Imp   92.90607
    dtype: float32
    
    <Figure size 720x720 with 0 Axes>
    In [53]:
    clf=XGBRegressor(n_estimators=150,max_depth=5,random_state=7)
    pipe_XGB_2=Pipeline([('XGB2',clf)])
    result_dff=pd.concat([result_dff,result('XGB2',pipe_XGB_2,X_train,y_train,X_val,y_val)])
    result_dff
    
    [18:09:21] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
    
    Out[53]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    In [54]:
    #Feature importance
    feat_imp(clf,X_train)
    
                         Imp
    furnished_1      0.59499
    living_measure   0.06767
    quality_9        0.06470
    coast_1          0.04365
    quality_8        0.03345
    lat              0.03122
    quality_10       0.03027
    sight            0.02396
    long             0.01961
    quality_12       0.01526
    living_measure15 0.01122
    yr_built         0.01016
    quality_11       0.00679
    quality_13       0.00676
    zipcode          0.00626
    ceil_measure     0.00527
    quality_7        0.00439
    condition        0.00407
    total_area       0.00362
    room_bath        0.00278
    lot_measure15    0.00274
    yr_renovated     0.00219
    lot_measure      0.00217
    basement         0.00205
    quality_6        0.00153
    ceil             0.00153
    room_bed         0.00113
    quality_5        0.00055
    quality_4        0.00000
    quality_3        0.00000
    
    First 8 feature importance:	 Imp   88.99167
    dtype: float32
    
    First 12 feature importance:	 Imp   94.61620
    dtype: float32
    
    <Figure size 720x720 with 0 Axes>

    ADABOOST REGRESSOR

    In [55]:
    from sklearn.ensemble import AdaBoostRegressor
    
    clf= AdaBoostRegressor(DecisionTreeRegressor(random_state=8))
    pipe_ADAB_1=Pipeline([('ADAB1',clf)])
    result_dff=pd.concat([result_dff,result('ADAB1',pipe_ADAB_1,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[55]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    In [56]:
    #Feature importance
    feat_imp(clf,X_train)
    
                         Imp
    living_measure   0.50994
    lat              0.09959
    furnished_1      0.06601
    long             0.06142
    coast_1          0.04096
    living_measure15 0.04011
    sight            0.03042
    ceil_measure     0.02662
    yr_built         0.01886
    lot_measure15    0.01721
    zipcode          0.01391
    total_area       0.01116
    room_bath        0.01004
    lot_measure      0.00888
    quality_11       0.00824
    basement         0.00793
    quality_12       0.00540
    quality_13       0.00373
    quality_9        0.00355
    room_bed         0.00343
    ceil             0.00261
    yr_renovated     0.00252
    condition        0.00235
    quality_8        0.00226
    quality_10       0.00209
    quality_7        0.00055
    quality_6        0.00017
    quality_5        0.00002
    quality_4        0.00000
    quality_3        0.00000
    
    First 8 feature importance:	 Imp   87.50772
    dtype: float64
    
    First 12 feature importance:	 Imp   93.62163
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>
    In [57]:
    clf= AdaBoostRegressor(DecisionTreeRegressor(max_depth=20),n_estimators=250,learning_rate=0.005,random_state=9)
    pipe_ADAB_2=Pipeline([('ADAB2',clf)])
    result_dff=pd.concat([result_dff,result('ADAB2',pipe_ADAB_2,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[57]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    In [58]:
    #Feature importance
    feat_imp(clf,X_train)
    
                         Imp
    living_measure   0.31020
    furnished_1      0.22982
    lat              0.16848
    long             0.07221
    living_measure15 0.03353
    coast_1          0.02876
    yr_built         0.02456
    ceil_measure     0.02078
    sight            0.01669
    zipcode          0.01550
    lot_measure15    0.01533
    total_area       0.01060
    lot_measure      0.00862
    quality_9        0.00846
    room_bath        0.00701
    basement         0.00560
    quality_8        0.00364
    room_bed         0.00335
    condition        0.00317
    quality_12       0.00265
    quality_11       0.00260
    yr_renovated     0.00229
    ceil             0.00218
    quality_10       0.00175
    quality_13       0.00100
    quality_7        0.00080
    quality_6        0.00032
    quality_5        0.00008
    quality_4        0.00001
    quality_3        0.00000
    
    First 8 feature importance:	 Imp   88.83441
    dtype: float64
    
    First 12 feature importance:	 Imp   94.64739
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>

    BAGGING REGRESSION

    In [59]:
    from sklearn.ensemble import BaggingRegressor
    
    clf= BaggingRegressor(random_state=10)
    pipe_BAG_1=Pipeline([('BAG1',clf)])
    result_dff=pd.concat([result_dff,result('BAG1',pipe_BAG_1,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[59]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
    In [60]:
    #Feature Importance
    feature_importances = np.mean([ tree.feature_importances_ for tree in clf.estimators_], axis=0)
    bg_imp_feature=pd.DataFrame(feature_importances, columns = ["Imp"],index=X_train.columns)
    bg_imp_feature.sort_values(by="Imp",ascending=False)
    
    Out[60]:
    Imp
    furnished_1 0.32952
    living_measure 0.21044
    lat 0.17412
    long 0.06964
    living_measure15 0.03440
    yr_built 0.03000
    coast_1 0.02448
    ceil_measure 0.01991
    zipcode 0.01548
    sight 0.01531
    lot_measure15 0.01498
    total_area 0.00974
    quality_9 0.00967
    lot_measure 0.00809
    room_bath 0.00737
    basement 0.00434
    room_bed 0.00403
    quality_8 0.00399
    condition 0.00313
    yr_renovated 0.00237
    ceil 0.00228
    quality_11 0.00182
    quality_12 0.00148
    quality_10 0.00137
    quality_13 0.00084
    quality_7 0.00068
    quality_6 0.00042
    quality_5 0.00010
    quality_4 0.00001
    quality_3 0.00000
    In [61]:
    clf= BaggingRegressor(DecisionTreeRegressor(max_depth=12),n_estimators=250,random_state=11)
    pipe_BAG_2=Pipeline([('BAG2',clf)])
    result_dff=pd.concat([result_dff,result('BAG2',pipe_BAG_2,X_train,y_train,X_val,y_val)])
    result_dff
    
    Out[61]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
    0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
    In [62]:
    #Feature Importance
    pd.options.display.float_format = '{:.5f}'.format
    feature_importances = np.mean([ tree.feature_importances_ for tree in clf.estimators_], axis=0)
    bg_imp_feature=pd.DataFrame(feature_importances, columns = ["Imp"],index=X_train.columns)
    bg_imp_feature.sort_values(by="Imp",ascending=False)
    
    Out[62]:
    Imp
    furnished_1 0.31748
    living_measure 0.23735
    lat 0.17613
    long 0.06834
    living_measure15 0.02947
    coast_1 0.02891
    yr_built 0.02585
    ceil_measure 0.01984
    sight 0.01504
    zipcode 0.01456
    lot_measure15 0.01222
    quality_9 0.00935
    total_area 0.00814
    lot_measure 0.00665
    room_bath 0.00590
    basement 0.00445
    quality_8 0.00431
    quality_12 0.00272
    quality_11 0.00231
    condition 0.00229
    room_bed 0.00227
    yr_renovated 0.00182
    ceil 0.00158
    quality_10 0.00150
    quality_13 0.00076
    quality_7 0.00048
    quality_6 0.00021
    quality_5 0.00006
    quality_4 0.00000
    quality_3 0.00000
    In [ ]:
     
    
    Dataset-2 model performance Summary

    We have used Linear Regression, Ridge and Lasso, KNN, Ensemble Techniques - Decision Trees, Random Forest, Bagging, AdaBoost, Gradient Boost and XGBoost - its gradient boost with regularization and its faster. R2 score on validation in range 70%-87% with RMSE in range 76000-107000. The model is showing better results.Lets hypertune to see if results could be improved further. Will use Random Forest, Gradient Boosting, XGBoost and AdaBoost hypertuning. Dropping features which are zero or very close to zero in all above 4 algos - quality_12, quality_3, quality_4.

    Kindly refer Excel sheet to compare the results.

    In [ ]:
    #Dropping features
    X_train_ht=X_train.drop(['quality_5', 'quality_3', 'quality_4'],1)
    X_test_ht=X_test.drop(['quality_5', 'quality_3', 'quality_4'],1)
    X_val_ht=X_val.drop(['quality_5', 'quality_3', 'quality_4'],1)
    
    In [ ]:
    skf = KFold(n_splits=5, random_state=12)
    

    RANDOM FOREST HYPERTUNE

    In [65]:
    #Tuning of Random Forest
    RF_ht = RandomForestRegressor()
    
    params = {"n_estimators": np.arange(76,84,1),"max_depth": np.arange(16,20,1),
              "max_features":np.arange(6,9,1),'min_samples_leaf': range(5, 8, 1),
        'min_samples_split': range(18, 20, 1)}
    
    RF_GV_1 = GridSearchCV(estimator = RF_ht, param_grid = params,cv=skf,verbose=1,return_train_score=True,n_jobs=2)
    RF_GV_1.fit(X_train_ht,y_train) 
    
    Fitting 5 folds for each of 576 candidates, totalling 2880 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   35.0s
    [Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:  2.3min
    [Parallel(n_jobs=2)]: Done 446 tasks      | elapsed:  5.3min
    [Parallel(n_jobs=2)]: Done 796 tasks      | elapsed: 10.0min
    [Parallel(n_jobs=2)]: Done 1246 tasks      | elapsed: 15.5min
    [Parallel(n_jobs=2)]: Done 1796 tasks      | elapsed: 22.5min
    [Parallel(n_jobs=2)]: Done 2446 tasks      | elapsed: 30.7min
    [Parallel(n_jobs=2)]: Done 2880 out of 2880 | elapsed: 36.6min finished
    
    Out[65]:
    GridSearchCV(cv=KFold(n_splits=5, random_state=12, shuffle=False),
           error_score='raise-deprecating',
           estimator=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
               max_features='auto', max_leaf_nodes=None,
               min_impurity_decrease=0.0, min_impurity_split=None,
               min_samples_leaf=1, min_samples_split=2,
               min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
               oob_score=False, random_state=None, verbose=0, warm_start=False),
           fit_params=None, iid='warn', n_jobs=2,
           param_grid={'n_estimators': array([76, 77, 78, 79, 80, 81, 82, 83]), 'max_depth': array([16, 17, 18, 19]), 'max_features': array([6, 7, 8]), 'min_samples_leaf': range(5, 8), 'min_samples_split': range(18, 20)},
           pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
           scoring=None, verbose=1)
    In [66]:
    # results of grid search CV
    RF_results = pd.DataFrame(RF_GV_1.cv_results_)
    
    #parameters best value
    best_score_rf = RF_GV_1.best_score_
    best_rf = RF_GV_1.best_params_
    best_rf
    
    Out[66]:
    {'max_depth': 18,
     'max_features': 8,
     'min_samples_leaf': 5,
     'min_samples_split': 18,
     'n_estimators': 81}
    In [67]:
    rf_best = RandomForestRegressor(max_depth= 18, max_features= 8,n_estimators=80,min_samples_leaf=5,min_samples_split=18,
                                    random_state=14)
    
    result_dff=pd.concat([result_dff,result('RF_ht',rf_best,X_train_ht,y_train,X_val_ht,y_val)])
    result_dff
    
    Out[67]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
    0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
    0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
    In [68]:
    #Feature importance
    feat_imp(rf_best,X_train_ht)
    
                         Imp
    living_measure   0.20762
    furnished_1      0.16841
    lat              0.15958
    living_measure15 0.08067
    ceil_measure     0.07752
    long             0.05371
    room_bath        0.04081
    yr_built         0.03216
    sight            0.02628
    zipcode          0.02266
    quality_9        0.02174
    coast_1          0.01627
    basement         0.01216
    quality_8        0.01187
    total_area       0.01125
    lot_measure15    0.01110
    quality_11       0.00995
    lot_measure      0.00946
    quality_10       0.00673
    quality_7        0.00643
    condition        0.00437
    quality_12       0.00272
    quality_6        0.00216
    room_bed         0.00135
    yr_renovated     0.00130
    ceil             0.00118
    quality_13       0.00057
    
    First 8 feature importance:	 Imp   82.04737
    dtype: float64
    
    First 12 feature importance:	 Imp   90.74157
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>

    GRADIENT BOOST HYPERTUNE

    In [69]:
    GB_ht=GradientBoostingRegressor()
    params = {"n_estimators": [138,142,1],"learning_rate":[0.08,0.09],"max_depth": np.arange(8, 11,1),
              "max_features":np.arange(5,8,1),'min_samples_leaf': range(16, 21, 1)}
    GB_GV_1 = GridSearchCV(estimator = GB_ht, param_grid = params,cv=skf,verbose=1,return_train_score=True,n_jobs=2)
    GB_GV_1.fit(X_train_ht,y_train) 
    
    # results of grid search CV
    GB_results = pd.DataFrame(GB_GV_1.cv_results_)
    #parameters best value
    best_score_rf = GB_GV_1.best_score_
    best_gb = GB_GV_1.best_params_
    best_gb
    
    Fitting 5 folds for each of 270 candidates, totalling 1350 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   20.2s
    [Parallel(n_jobs=2)]: Done 196 tasks      | elapsed:  1.5min
    [Parallel(n_jobs=2)]: Done 446 tasks      | elapsed:  3.9min
    [Parallel(n_jobs=2)]: Done 796 tasks      | elapsed:  7.1min
    [Parallel(n_jobs=2)]: Done 1246 tasks      | elapsed: 11.2min
    [Parallel(n_jobs=2)]: Done 1350 out of 1350 | elapsed: 12.4min finished
    
    Out[69]:
    {'learning_rate': 0.09,
     'max_depth': 8,
     'max_features': 7,
     'min_samples_leaf': 17,
     'n_estimators': 142}
    In [70]:
    gb_best = GradientBoostingRegressor(learning_rate= 0.09, n_estimators= 150,max_depth= 10, 
                                               max_features= 7,min_samples_leaf=19)
    
    result_dff=pd.concat([result_dff,result('GB_ht',gb_best,X_train_ht,y_train,X_val_ht,y_val)])
    result_dff
    
    Out[70]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
    0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
    0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
    0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
    In [71]:
    #Feature importance
    feat_imp(gb_best,X_train_ht)
    
                         Imp
    living_measure   0.21739
    lat              0.15607
    furnished_1      0.13874
    living_measure15 0.10908
    long             0.06063
    ceil_measure     0.05424
    room_bath        0.04955
    sight            0.03128
    yr_built         0.02943
    coast_1          0.02644
    zipcode          0.02530
    lot_measure15    0.01702
    quality_9        0.01336
    total_area       0.01053
    lot_measure      0.01020
    basement         0.00833
    condition        0.00829
    quality_7        0.00649
    quality_12       0.00592
    quality_8        0.00494
    quality_11       0.00487
    quality_10       0.00320
    quality_6        0.00318
    room_bed         0.00243
    yr_renovated     0.00185
    ceil             0.00125
    quality_13       0.00000
    
    First 8 feature importance:	 Imp   81.69770
    dtype: float64
    
    First 12 feature importance:	 Imp   91.51647
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>

    ADABOOST HYPERTUNE

    In [72]:
    ADAB_ht=AdaBoostRegressor(DecisionTreeRegressor(max_depth=28))
    params = {"n_estimators": [176,182,1],"learning_rate":[0.4,0.5,0.6],'loss':['linear','square']}
    ADAB_GV_1 = GridSearchCV(estimator = ADAB_ht, param_grid = params,cv=skf,verbose=1,return_train_score=True,n_jobs=2)
    ADAB_GV_1.fit(X_train_ht,y_train) 
    
    Fitting 5 folds for each of 18 candidates, totalling 90 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:  3.9min
    [Parallel(n_jobs=2)]: Done  90 out of  90 | elapsed:  7.5min finished
    
    Out[72]:
    GridSearchCV(cv=KFold(n_splits=5, random_state=12, shuffle=False),
           error_score='raise-deprecating',
           estimator=AdaBoostRegressor(base_estimator=DecisionTreeRegressor(criterion='mse', max_depth=28, max_features=None,
               max_leaf_nodes=None, min_impurity_decrease=0.0,
               min_impurity_split=None, min_samples_leaf=1,
               min_samples_split=2, min_weight_fraction_leaf=0.0,
               presort=False, random_state=None, splitter='best'),
             learning_rate=1.0, loss='linear', n_estimators=50,
             random_state=None),
           fit_params=None, iid='warn', n_jobs=2,
           param_grid={'n_estimators': [176, 182, 1], 'learning_rate': [0.4, 0.5, 0.6], 'loss': ['linear', 'square']},
           pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
           scoring=None, verbose=1)
    In [73]:
    # results of grid search CV
    ADAB_results = pd.DataFrame(ADAB_GV_1.cv_results_)
    #parameters best value
    best_score_rf = ADAB_GV_1.best_score_
    best_adab = ADAB_GV_1.best_params_
    best_adab
    
    Out[73]:
    {'learning_rate': 0.5, 'loss': 'linear', 'n_estimators': 176}
    In [74]:
    adab_best = AdaBoostRegressor(DecisionTreeRegressor(max_depth=28),n_estimators=180,learning_rate=0.5,loss='linear',
                                  random_state=15)
    
    result_dff=pd.concat([result_dff,result('ADAB_ht',adab_best,X_train_ht,y_train,X_val_ht,y_val)])
    result_dff
    
    Out[74]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
    0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
    0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
    0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
    0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670
    In [75]:
    #Feature importance
    feat_imp(adab_best,X_train_ht)
    
                         Imp
    living_measure   0.48898
    furnished_1      0.10561
    lat              0.09726
    long             0.05701
    coast_1          0.03784
    living_measure15 0.03747
    sight            0.02427
    ceil_measure     0.02000
    yr_built         0.01993
    lot_measure15    0.01562
    zipcode          0.01534
    room_bath        0.01240
    total_area       0.01094
    lot_measure      0.00964
    basement         0.00810
    quality_9        0.00798
    quality_11       0.00571
    quality_12       0.00445
    room_bed         0.00382
    yr_renovated     0.00322
    condition        0.00291
    quality_10       0.00286
    ceil             0.00268
    quality_8        0.00253
    quality_13       0.00252
    quality_7        0.00073
    quality_6        0.00017
    
    First 8 feature importance:	 Imp   86.84492
    dtype: float64
    
    First 12 feature importance:	 Imp   93.17415
    dtype: float64
    
    <Figure size 720x720 with 0 Axes>

    XGBoost Regressor

    In [76]:
    #Regularization using GridSearchCV - 1st Iteration
    XGB_ht_1=XGBRegressor(objective='reg:squarederror')
    params1 = {
        "colsample_bytree": [i/100.0 for i in range(66,74,2)],
        "learning_rate": [0.2,0.22,0.24], 
        "n_estimators": [185,188,1],
        "subsample": [i/100.0 for i in range(62,68,1)]
    }
    XGB_GV_1 = GridSearchCV(estimator = XGB_ht_1, param_grid = params1, 
                            cv=skf,
                            verbose = 1,
                           return_train_score=True,n_jobs=2) 
    XGB_GV_1.fit(X_train_ht,y_train) 
    
    Fitting 5 folds for each of 216 candidates, totalling 1080 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   27.5s
    [Parallel(n_jobs=2)]: Done 257 tasks      | elapsed:  1.8min
    [Parallel(n_jobs=2)]: Done 617 tasks      | elapsed:  4.3min
    [Parallel(n_jobs=2)]: Done 1077 out of 1080 | elapsed:  7.5min remaining:    1.2s
    [Parallel(n_jobs=2)]: Done 1080 out of 1080 | elapsed:  7.5min finished
    
    Out[76]:
    GridSearchCV(cv=KFold(n_splits=5, random_state=12, shuffle=False),
           error_score='raise-deprecating',
           estimator=XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
           colsample_bynode=1, colsample_bytree=1, gamma=0,
           importance_type='gain', learning_rate=0.1, max_delta_step=0,
           max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
           n_jobs=1, nthread=None, objective='reg:squarederror',
           random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
           seed=None, silent=None, subsample=1, verbosity=1),
           fit_params=None, iid='warn', n_jobs=2,
           param_grid={'colsample_bytree': [0.66, 0.68, 0.7, 0.72], 'learning_rate': [0.2, 0.22, 0.24], 'n_estimators': [185, 188, 1], 'subsample': [0.62, 0.63, 0.64, 0.65, 0.66, 0.67]},
           pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
           scoring=None, verbose=1)
    In [77]:
    # results of grid search CV
    XGB_results_1 = pd.DataFrame(XGB_GV_1.cv_results_)
    #parameters best value
    best_score_xgb_1 = XGB_GV_1.best_score_
    best_xgb_1 = XGB_GV_1.best_params_
    best_xgb_1
    
    Out[77]:
    {'colsample_bytree': 0.68,
     'learning_rate': 0.2,
     'n_estimators': 185,
     'subsample': 0.67}
    In [78]:
    #Choosing best parameter from 1st Iteration
    xgb_best_1 = XGBRegressor(colsample_bytree=0.7,learning_rate=0.22,n_estimators=186,subsample=0.65,objective='reg:squarederror',
                             random_state=16)
    
    result_dff=pd.concat([result_dff,result('xgb_1_ht',xgb_best_1,X_train_ht,y_train,X_val_ht,y_val)])
    result_dff
    
    Out[78]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
    0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
    0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
    0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
    0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670
    0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670
    In [79]:
    #Feature importance
    feat_imp(xgb_best_1,X_train_ht)
    
                         Imp
    furnished_1      0.45991
    living_measure   0.08495
    quality_9        0.07530
    lat              0.05495
    sight            0.04581
    coast_1          0.04306
    quality_8        0.02980
    long             0.02709
    quality_12       0.02143
    living_measure15 0.01880
    quality_6        0.01387
    quality_11       0.01343
    quality_13       0.01221
    zipcode          0.01197
    room_bath        0.01166
    yr_built         0.01035
    condition        0.01023
    quality_10       0.00940
    ceil_measure     0.00728
    lot_measure15    0.00666
    basement         0.00661
    total_area       0.00550
    ceil             0.00534
    yr_renovated     0.00469
    lot_measure      0.00348
    room_bed         0.00313
    quality_7        0.00312
    
    First 8 feature importance:	 Imp   82.08646
    dtype: float32
    
    First 12 feature importance:	 Imp   88.83791
    dtype: float32
    
    <Figure size 720x720 with 0 Axes>
    In [80]:
    #Regularization using GridSearchCV - 2nd Iteration
    
    params2 = {
        'min_child_weight':[6,7,8,9,10],"max_depth": [3,4,5],
    }
    
    xgb_best_2 = GridSearchCV(estimator = xgb_best_1, param_grid = params2, 
                            cv=skf,
                            verbose = 1,
                           return_train_score=True,n_jobs=2) 
    
    xgb_best_2.fit(X_train_ht, y_train) 
    
    # results of grid search CV
    XGB_results_2 = pd.DataFrame(xgb_best_2.cv_results_)
    XGB_results_2
    
    #parameters best value
    best_score_xgb_2 = xgb_best_2.best_score_
    best_xgb_2 = xgb_best_2.best_params_
    best_xgb_2
    
    Fitting 5 folds for each of 15 candidates, totalling 75 fits
    
    [Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
    [Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:   31.6s
    [Parallel(n_jobs=2)]: Done  75 out of  75 | elapsed:   59.3s finished
    
    Out[80]:
    {'max_depth': 5, 'min_child_weight': 7}
    In [81]:
    #Choosing best parameter from 2nd Iteration
    xgb_best_2 = XGBRegressor(colsample_bytree=0.7,learning_rate=0.22,n_estimators=186,subsample=0.65,objective='reg:squarederror',
                             random_state=17,max_depth=4,min_child_weight=8)
    result_dff=pd.concat([result_dff,result('xgb_2_ht',xgb_best_2,X_train_ht,y_train,X_val_ht,y_val)])
    result_dff
    
    Out[81]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
    0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
    0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
    0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
    0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670
    0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670
    0 xgb_2_ht 0.88877 113483.76321 12878564511.84933 69351.16691 0.94593 81048.97730 6568936720.98572 55372.52924
    In [82]:
    #Feature importance
    feat_imp(xgb_best_2,X_train_ht)
    
                         Imp
    furnished_1      0.46173
    quality_9        0.08325
    living_measure   0.07545
    coast_1          0.06976
    lat              0.04473
    sight            0.02880
    room_bath        0.02802
    quality_8        0.02780
    long             0.02034
    quality_10       0.01916
    quality_7        0.01823
    yr_built         0.01647
    living_measure15 0.01506
    quality_12       0.01277
    zipcode          0.01129
    quality_11       0.01026
    quality_13       0.00990
    lot_measure15    0.00685
    condition        0.00637
    ceil_measure     0.00625
    lot_measure      0.00535
    room_bed         0.00455
    total_area       0.00408
    ceil             0.00406
    basement         0.00383
    yr_renovated     0.00322
    quality_6        0.00243
    
    First 8 feature importance:	 Imp   81.95339
    dtype: float32
    
    First 12 feature importance:	 Imp   89.37298
    dtype: float32
    
    <Figure size 720x720 with 0 Axes>
    In [83]:
    #Regularization using GridSearchCV - 3rd Iteration
    
    params3 = {
        'gamma':[i/1.0 for i in range(50,55,1)]
    }
    
    xgb_best_3 = GridSearchCV(estimator = xgb_best_2, param_grid = params3, 
                            cv=skf,
                            verbose = 1,
                           return_train_score=True) 
    
    xgb_best_3.fit(X_train_ht, y_train) 
    
    # results of grid search CV
    XGB_results_3 = pd.DataFrame(xgb_best_3.cv_results_)
    XGB_results_3
    
    #parameters best value
    best_score_xgb_3 = xgb_best_3.best_score_
    best_xgb_3 = xgb_best_3.best_params_
    best_xgb_3
    
    Fitting 5 folds for each of 5 candidates, totalling 25 fits
    
    [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
    [Parallel(n_jobs=1)]: Done  25 out of  25 | elapsed:   39.0s finished
    
    Out[83]:
    {'gamma': 50.0}
    In [84]:
    #Choosing best parameter from 3rd Iteration
    xgb_best_3 = XGBRegressor(colsample_bytree=0.7,learning_rate=0.22,n_estimators=186,subsample=0.65,objective='reg:squarederror',
                             random_state=18,max_depth=4,min_child_weight=8,reg_lambda=52)
    result_dff=pd.concat([result_dff,result('xgb_3_ht',xgb_best_3,X_train_ht,y_train,X_val_ht,y_val)])
    result_dff
    
    Out[84]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
    0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
    0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
    0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
    0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670
    0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670
    0 xgb_2_ht 0.88877 113483.76321 12878564511.84933 69351.16691 0.94593 81048.97730 6568936720.98572 55372.52924
    0 xgb_3_ht 0.89860 108356.33811 11741096009.16987 67404.86000 0.93004 92192.80765 8499513782.44646 60276.22610
    In [85]:
    #Feature importance
    feat_imp(xgb_best_3,X_train_ht)
    
                         Imp
    furnished_1      0.55026
    living_measure   0.11252
    coast_1          0.04955
    sight            0.04906
    lat              0.04026
    quality_8        0.03380
    long             0.01515
    quality_6        0.01480
    quality_11       0.01418
    quality_12       0.01409
    living_measure15 0.01227
    quality_9        0.00959
    zipcode          0.00910
    condition        0.00887
    quality_10       0.00752
    ceil_measure     0.00718
    yr_built         0.00665
    total_area       0.00638
    yr_renovated     0.00607
    quality_13       0.00578
    room_bath        0.00527
    room_bed         0.00497
    ceil             0.00429
    lot_measure      0.00399
    basement         0.00379
    lot_measure15    0.00335
    quality_7        0.00128
    
    First 8 feature importance:	 Imp   86.54022
    dtype: float32
    
    First 12 feature importance:	 Imp   91.55212
    dtype: float32
    
    <Figure size 720x720 with 0 Axes>

    We have executed many models and post comparing results we hyper tuned four models. All models are working well with R2 score greater than 86% RMSE is below 132600.

    But best of of all is Xtreme Gradient boost - which is enhanced version of gradient boost. It includes regularisation and is faster too. Its giving R2 score of around 89.5% with RMSE of around 109000.

    Moving forward this model can be improved further as dont have much data for very high priced houses. So when more data comes in we can revisit our model and make mecessary changes to accommodate more variation in data to deliver better results, maybe try to decrease RMSE.

    Finally lets run our model on test data, which we havent used till now and see how it performs.

    Executing xgb_3_ht on test data set

    In [86]:
    result_dff=pd.concat([result_dff,result('xgb_test',xgb_best_3,X_test_ht,y_test,X_val_ht,y_val)])
    result_dff
    
    Out[86]:
    Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
    0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
    0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
    0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
    0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
    0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
    0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
    0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
    0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
    0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
    0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
    0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
    0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
    0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
    0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
    0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
    0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
    0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
    0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
    0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670
    0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670
    0 xgb_2_ht 0.88877 113483.76321 12878564511.84933 69351.16691 0.94593 81048.97730 6568936720.98572 55372.52924
    0 xgb_3_ht 0.89860 108356.33811 11741096009.16987 67404.86000 0.93004 92192.80765 8499513782.44646 60276.22610
    0 xgb_test 0.87484 120381.47322 14491699093.75940 72694.97007 0.94998 78343.92038 6137769859.95983 53777.32335
    In [87]:
    #Feature importance
    feat_imp(xgb_best_3,X_test_ht)
    
                         Imp
    furnished_1      0.53507
    living_measure   0.13423
    sight            0.05095
    coast_1          0.03399
    lat              0.03394
    quality_9        0.02366
    quality_8        0.02151
    long             0.01854
    quality_7        0.01645
    ceil_measure     0.01541
    room_bath        0.01410
    living_measure15 0.01192
    condition        0.01068
    yr_renovated     0.00998
    yr_built         0.00925
    quality_11       0.00810
    zipcode          0.00625
    lot_measure15    0.00602
    total_area       0.00579
    quality_6        0.00567
    basement         0.00538
    quality_12       0.00534
    quality_10       0.00493
    lot_measure      0.00478
    ceil             0.00428
    room_bed         0.00379
    quality_13       0.00000
    
    First 8 feature importance:	 Imp   85.18983
    dtype: float32
    
    First 12 feature importance:	 Imp   90.97836
    dtype: float32
    
    <Figure size 720x720 with 0 Axes>

    CALCULATING CONFIDENCE INTERVAL ON THE FINAL SELECTED MODEL at 95% ALPHA

    In [88]:
    from sklearn.model_selection import KFold
    from sklearn.model_selection import cross_val_score
    
    num_folds = 200
    seed = 7
    
    kfold = KFold(n_splits=num_folds, random_state=seed)
    results = cross_val_score(xgb_best_3, X_test_ht, y_test, cv=kfold)
    print(results)
    print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))
    
    [ 0.90391651  0.90916102  0.90837406  0.92102958  0.84137223  0.85154394
      0.79054082  0.97225657  0.62255882  0.95655541  0.87971233  0.54938847
      0.919873    0.90449203  0.91960521  0.82845636  0.85087562  0.84624357
      0.84949297  0.79902964  0.88093633  0.7965441   0.85767605  0.89117899
      0.87695964  0.81590065  0.77554087  0.82172976  0.89524705  0.60028268
      0.91819488  0.7676954   0.92467382  0.76400042 -0.01087648  0.94301005
      0.7988163   0.8973989   0.80375734  0.87449297  0.95865757  0.9275524
      0.9097657   0.91836083  0.92456681  0.96787804  0.8355066   0.97563326
      0.90399211  0.89793941  0.85086961  0.89391916  0.59636222  0.94398635
      0.53656514  0.87802398  0.86956142  0.86946016  0.82775075  0.90893744
      0.92036889  0.92163685  0.81946895  0.9143283   0.81252437  0.92824432
      0.75878566  0.81404196  0.87121462  0.73438774  0.80718153  0.88708332
      0.91354842  0.52667519  0.94112667  0.93731003  0.94483886  0.97033654
      0.76244928  0.93123175  0.77286008  0.87546557  0.60705664  0.72760754
      0.82665212  0.91951727  0.94649817  0.93530476  0.91908615  0.94478304
      0.93804561  0.80743798  0.95095218  0.84086034  0.94263966  0.85434296
      0.8939842   0.91195926  0.89329183  0.94217187  0.92094018  0.92534352
      0.84231454  0.80070691  0.78969709  0.89154176  0.75224552  0.98563106
      0.96707234  0.90153511  0.77089402  0.89182195  0.89960071  0.85305716
      0.94549166  0.86431631  0.85722134  0.67693538  0.90097462  0.92198301
      0.78518065  0.76819692  0.88903017  0.90340532  0.89964216  0.71263816
      0.98670033  0.85944924  0.81788499  0.90645091  0.77838803  0.86403478
      0.85040232  0.73824728  0.93391523  0.89215502  0.9170631   0.86449047
      0.81659417  0.87965375  0.89630691  0.75384405  0.91273398  0.90846708
      0.98175881  0.89090127  0.87495474  0.94566111  0.88549609  0.78429757
      0.8835784   0.83106831  0.71277922  0.92337898  0.96179742  0.70433655
      0.87525256  0.62843049  0.92354528  0.93623984  0.88524244  0.86559362
      0.78977878  0.93659078  0.92459342  0.89326338  0.77853101  0.88929344
      0.75543453  0.76270482  0.91536853  0.77264839  0.73741813  0.96582459
      0.89034114  0.81234031  0.81053727  0.86102493  0.97418468  0.94098004
      0.90470082  0.89779213  0.77860791  0.92766247  0.66861     0.30180163
      0.7851057   0.91198086  0.87794581  0.84816996  0.93551467  0.97131443
      0.93234322  0.74688263  0.69960959  0.93554804  0.94104945  0.92845367
      0.82424248  0.77653242]
    Accuracy: 85.137% (11.459%)
    
    In [89]:
    from matplotlib import pyplot
    # plot scores
    pyplot.hist(results)
    pyplot.show()
    # confidence intervals
    alpha = 0.95                     # for 95% confidence 
    p = ((1.0-alpha)/2.0) * 100      # tail regions on right and left .25 on each side indicated by P value (border)
    lower = max(0.0, np.percentile(results, p))  
    p = (alpha+((1.0-alpha)/2.0)) * 100
    upper = min(1.0, np.percentile(results, p))
    print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha*100, lower*100, upper*100))
    print('Average accuracy result on test data is %.3f%%:' % (np.mean(results)*100))
    
    95.0 confidence interval 59.5% and 97.2%
    Average accuracy result on test data is 85.137%:
    
    In [92]:
    sns.set(style="darkgrid", color_codes=True)
                
    with sns.axes_style("white"):
        
        sns.jointplot(x=y_val, y=xgb_best_3.predict(X_val_ht), kind="reg", color="k")
        plt.title('Actual and Predicted', fontsize=20)       # Plot heading 
        plt.xlabel('Actual', fontsize=10)                     # X-label
        plt.ylabel('Predicted', fontsize=10)
        plt.tight_layout()
    

    Dataset-2 Final summary

    Finally we have the result, our final selected model is performing well on the test data R2 score of around 87.0% with RMSE of around 120000. </i>

    Most important feature for pricing is furnished.The furnished house is priced higher.

    Some other important features that affect price the most are living measure, latitude, above average quality of house and coastal house. So, one needs to thoroughly introspect its property on parameters suggested and list its price accordingly, similarly if one wants buy house - needs to check the features suggested above in house and calculate the predicted price. The same can than be compared to listed price.

    Dataset-1 Final summary:

  • The ensemble models have performed well compared to that of linear,KNN,SVR models
  • The best performance is given by Gradient boosting model with training (score-0.89,RMSE-81372), Validation (score-0.80,RSME-115867), Testing(score-0.79,RMSE-114695) The 95% confidence interval scores range from 0.72 to 0.85.
  • The top key features that drive the price of the property are: 'furnished_1', 'yr_built', 'living_measure','quality_8', 'HouseLandRatio', 'lot_measure15', 'quality_9', 'ceil_measure', 'total_area'.
  • The above data is also reinforced by the analysis done during bivariate analysis.
  • For further improvization, the datasets can be made by treating outliers in different ways and hypertuning the ensemble models.
  • </b>

    CONCLUSION:

    We have build different models on 2 datasets. The performance (score and 95% confidence interval scores) of the model build on dataset-1 is better than dataset-2 as the 95% confidence interval of dataset-1 is very narrow compared to that of dataset-2. Even though the score of dataset-2 model is higher, the model has very vast range of performance scores.

    The top key features to consider for pricing a property are:'furnished_1', 'yr_built', 'living_measure','quality_8', 'lot_measure15', 'quality_9', 'ceil_measure', 'total_area'. These are almost similar in both the models

    So, one needs to thoroughly introspect its property on parameters suggested and list its price accordingly, similarly if one wants buy house - needs to check the features suggested above in house and calculate the predicted price. The same can than be compared to listed price.

    For further improvization, the datasets can be made by treating outliers in different ways and hypertuning the ensemble models. Making polynomial features and improvising the model performance can also be explored further.

    Pickle file Creation

    First we will define the function for data-preprocessing that is required to run through the model. Then we will recall the same for predicting the price(target) of the property.

    The pickle file is created as per the steps followed for dataset-2.

    In [9]:
    #Defining Funcation to process all required steps as done in model
    def model(data):
        import pandas as pd   
        import numpy as np 
        
        X_test = pd.read_excel(data)
        
        #Removing outliers
        X_test_1=X_test[(X_test['living_measure']<=9000) & (X_test['price']<=4000000) & 
                          (X_test['room_bed']<=10) & (X_test['room_bath']<=6)]
                          
        cols=['cid','dayhours']
        X_test_1=X_test.drop(cols, inplace = False, axis = 1)
        
        #columns to be converted to category
        categ=['coast', 'furnished','quality']
        #X_test_2=X_test_1[categ].astype('category')
    
        # Concatenate X_test_dummy_1 variables with X_test_2
        #X_test_final = pd.concat([X_test_1, X_test_2], axis=1)
        X_test_final=X_test_1.copy()
        
        for i in range(1,2):
            X_test_final['coast_'+str(i)]=0
            X_test_final['furnished_'+str(i)]=0
        
        for i in range(1,14):
            X_test_final['quality_'+str(i)]=0
    
        for i in range(1,2):
            if ((X_test_final['coast']==i).bool()):
                X_test_final['coast_'+str(i)]=1
            
        for i in range(1,2):
            if ((X_test_final['furnished']==i).bool()):
                X_test_final['furnished_'+str(i)]=1
                    
        for i in range(1,14):
            if ((X_test_final['quality']==i).bool()):
                X_test_final['quality_'+str(i)]=1
        X_test_final=X_test_final.drop([ 'quality_3', 'quality_4', 'quality_1', 'quality_2', 'quality_5','price'],1)
        # Drop categorical variable columns
        X_test_final = X_test_final.drop(X_test_final[categ], axis=1)
    
        return X_test_final
    

    Test run on pickle file:

    In [ ]:
    import pickle
    with open('model_pickle','wb') as f:
        pickle.dump(xgb_best_3,f)
    
    In [11]:
    with open('model_pickle','rb') as f:
        mp=pickle.load(f)
    
    In [14]:
    X_test=model('innercity.xlsx')
    mp.predict(X_test)
    #X_test.columns
    
    Out[14]:
    array([314002.16], dtype=float32)

    We can see that with the given parameters, pickle file has run through the model and given predicted price of the property

    In [ ]: